Data source configuration


qbo insights allows conversational access to data in a relational database without the need for the user to know SQL. In order to provide this access, qbo insights has a unique process for understanding the contents of a database, deciding on the kinds of queries that can be asked of the database, exposing the possible queries to the end-user through auto-completions and auto-suggestions, and training the natural language processing to understand users’ utterances in the context of the data present in the database. Most of this process is automated; but this process still benefits from various kinds of information and hints about the data model from a developer or data architect.

qbo insights supports a wide variety of queries on relational databases utilizing all common SQL constructs, including SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, JOIN, AND, OR, NOT, LIKE, HAVING, UNION, etc. It also supports various date related functions, nested queries, queries involving multiple clauses, binning, various aggregation functions, etc. It also supports querying different kinds of tables including fact tables, dimension tables, junction tables, etc. and schemas like star and snowflake schemas.

One can hookup multiple data sources to qbo, the QueryBot. For instance, qbo could be connected to the sales database on MariaDB as well as with the inventory database on Oracle. When a user asks a question, it would be converted to a query and routed to the appropriate database for fetching the results. This assumes that entity names do not clash between the data sources. Furthermore, at this time qbo does not support queries that refer to data across data sources. If one needs to query across data sources, we recommend using technologies like Apache Presto.

In this section, we go through the configuration steps to add a new data source, select that tables that will be used by the QueryBot, and configure the data model and analytics against the selected tables.

New data source


When trying to add a new data source, one has to start by clicking Add a new data source card. To reach this page browse to Settings > Data Sources.

../../_images/dataModel-AddNewDraft.jpg

Related Videos

Add a new data source


On the resulting page, one has to enter a name for the data source and select the kind of data sources.

Note

qbo insights supports the following databases, which may be installed on-premise or on the cloud and file upload option:

  • Impala

  • JavaDB

  • MariaDB

  • Microsoft SQL Server

  • MySQL

  • Oracle Database

  • PostgreSQL

  • Presto

../../_images/dataModel-SupportedDataSources.jpg

Note

  • Athena

  • HubSpot

  • Salesforce

  • Snowflake

../../_images/dataModel-SupportedDataSourcesCloud.jpg

Note

  • Excel

../../_images/dataModel-SupportedDataSourcesFile.jpg

If your data sources are not listed above, please get in touch with helpdesk@unscrambl.com and the team might be able to assist you. Additional data sources are being added continuously.

The next screen asks for credentials corresponding to the selected data source.

On-premise data sources

For JavaDB/MariaDB/Microsoft SQL Server/ MySQL/Oracle Database/PostgreSQL , for instance, this would include the hostname/IP address, port, database name, username and password.

../../_images/dataModel-MariaDB.jpg

The additional information required for Impala is Kerberos Authentication Configuration and SSL Configuration. For instance, this will include Service Name and Server Certificate File Path respectively.

../../_images/dataModel-Impala.jpg

The additional information required for Presto is Kerberos Authentication Configuration that is Service Name and Principal Name, SSL Configuration that is Server Certificate File Path and Catalog information.

../../_images/dataModel-Presto.jpg

Cloud data sources

Now let’s see for cloud data sources what details are required for connection.

For connecting to cloud Athena data source Region, S3 output location, Database name, AWS access key and AWS secret key is required.

The Region is an S3 output bucket’s location must be in the same region as the connection location.

In order to get your Access Key ID and Secret Access Key follow next steps:

  - Open the IAM console
  - From the navigation menu, click Users.
  - Select your IAM user name
  - Click User Actions, and then click Manage Access Keys
  - In order to get your Access Key ID and Secret Access Key follow next steps.
  - Click Create Access Key
  - Your keys will look something like this:
    > Access key ID example: AKIAIOSFODNN7EXAMPLE
    > Secret access key example: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  - Click Download Credentials
../../_images/dataModel-Cloud-Athena.jpg

For connecting to the cloud Hubspot data source HubSpot API key is required.

In your HubSpot account, click the settings icon settings in the main navigation bar.
In the left sidebar menu, navigate to Integrations > API key.
If a key has never been generated for your account, click Generate API key.
If you've already generated an API key, click Show to display your key.
../../_images/dataModel-Cloud-Hubspot.jpg

For connecting to cloud Salesforce data source Username, Password and Security token is required.

A security token is an automatically generated key that you must add to the end of your password in order to login to
Salesforce from an untrusted network.
For example, if your password is mypassword and yourmsecurity token is XXXXXXXXXX then you must enter
mypasswordXXXXXXXXXX to log in.
../../_images/dataModel-Cloud-Salesforce.jpg

For connecting to cloud Snowflake data source Account name, Database name(case sensitive) and Warehouse, Role (optional), Username and Password is required.

Account name: your account name is the full/entire string to the left of snowflakecomputing.com.
E.g. https://.snowflakecomputing.com
Database name and warehouse name on which the database is running.
../../_images/dataModel-Cloud-Snowflake.jpg

For connecting to cloud Databricks data source Hostname/IP address, Port, Database name, Http path and Password is required.

../../_images/dataModel-Cloud-Databricks.jpg

For connecting to cloud AzureSynapse data source Hostname/IP address, Port, Database name, Username and Password is required.

../../_images/dataModel-Cloud-AzureSynapse.jpg

For connecting to cloud Redshift data source Hostname/IP address, Port, Database name, Username and Password is required.

../../_images/dataModel-Cloud-Redshift.jpg

File upload

For connecting to Excel files you need to define the following options. Let’s see them in detail.

Define Tables:              Table1=Sheet1!A1:N25,Table2=Sheet2!C3:M53 (Use this format to define tables)
Upload file:                Upload your excel file (50 MB maximum size) from your desktop.
Header:                     Select this if the header can be used to identify attributes.
Orientation:                Vertical/Horizontal
Type detection scheme:      ColumnFormat/RowScan/None
Has cross sheet references: Select this option if there is a relationship between different sheets.

Tables can be defined in the following manner for excel file having more than one sheet. Let’s suppose excel file two sheets named netflix_movies data populated from column A to I have 48 rows and netflix_series data populated from column A to K have 6235 rows.

../../_images/dataModel-File-Excel.jpg

Sheets will now be added as tables.

../../_images/settings-ExcelTables.jpg

Various data types in Excel are interpreted by String, Double and Datetime in qbo insights. Let’s take an example. Refer to the following Excel sheet with various data formats.

../../_images/dataModel-ExcelSheet.jpg

These formats will be interpreted by qbo insights in the following data types.

../../_images/dataModel-ExcelFormatting.jpg

Note

  • MS Excel file should be processed in qbo, any other format for eg. google sheets or Libre Office should be first converted to MS Excel file.

  • Verify properly the data types of various Excel columns. This step is necessary so that qbo can identify data type correctly. Convert your column type to double, date, etc., as needed.

  • Note that columns of type Number in Excel are interpreted as Double by qbo insights, even if they happen to contain only integers and columns of type Date are interpreted as DateTime.

  • Excel only supports String, Double and Datetime.

../../_images/dataModel-ExcelConnectorFormats.jpg

Note

  • It’s recommended to start the data from the top left and to use ColumnFormat for data types.

  • If extra columns are shown in your logical data model usually these columns are named A, B, C etc., Such columns can be safely deleted from the model.

  • If RowId gets added as a new Integer attribute, and the users can choose to delete it.

qbo insights is able to automatically fetch the schema information from the selected data source and display it to the user. The user can then select the tables that they want to query for the data source being configured.

../../_images/dataModel-TablesSelected.jpg

Please note that more tables can be added later, and very often configuring the data model is an iterative process.

Data Source Settings

There are a few settings available for data source. Click Icon Vertical on the right side of the data source section.

../../_images/dataSourceSettings.jpg

Free form query configurations

qbo uses the ability of the NLP model to understand user queries with no prior training, data preparation or human-driven model building. Various options are available in the free form query configurations section. Let’s understand them in detail.

Max number of variations: In case of an ambiguity, the max number of alternative canonical query variations to be prompted to the user.

Max query prompt trial count: In case of an unclear query with missing attributes, the max number of the same clarification question to be asked to the user for the same missing attribute. If the user cannot provide a valid answer to fill the missing attribute within the trial count, qbo will stop trying to resolve the current query and ask for the next one.

Query prompt loop breaker intent: In case of an unclear query with missing attributes, the name of the resolved intent for the user input to break the clarification questions loop. After breaking the clarification loop, qbo will ask for the next query.

Query results expansion rate: The canonical query results returned by the similarity recognizer are scored according to their attributes. The highest scoring result is returned as the similarity result. If there are query results with very close scores, the query results expansion rate serves to add these query results to the similarity result as the alternatives.

Max branching depth: Explanation to be added

Max number of entities: Explanation to be added

Max number of recursive branches: Explanation to be added

Response: It defines the automatic response to be given to the user, in case the user’s query matches the intent.

Intents: It defines the intent names, which will be answered with the automatic response. The intent recognition rules are defined in the “Rule based intent recognizer configuration intent rules”.

Rule based intent recognizer configuration intent rules

Intent: The name of the intent

Pattern: The pattern to be recognized as an intent. The user’s query is matched against this pattern.

Max branching depth: Maximum depth of the grammar tree used to build a query template. Deeper trees might result in more complex query templates.

Max number of entities: Maximum number of entities allowed to be present in a query template.

Max number of recursive branches: Maximum number of allowed recursions in the same branch of the grammar tree. Usually used to implement conjunctions of the same grammar clause.

../../_images/NLPOptions.jpg

Sample Queries

This option is used to provide a list of possible example queries that current qbo instances can answer. These sample queries are picked by the help command in MS Teams.

../../_images/sampleQueries.jpg

Session timeout expiration

This option is used to set a timeout for the qbo server in minutes.

../../_images/sessionTimeout.jpg

Configuring the data model


The next step is where one can start editing the entities, specify relationships, etc. Each of the selected table in the step above is treated as an entity. One can use this step of the data model configuration to manage various aspects of the entity (e.g. entity kind, primary keys, default temporal attribute, etc.), add/remove attributes to the entity and also configure analytics for the entity. We will going through such configuration in this section.

One can start by selecting the entity to be edited from the listing on the left. Each entity has configurations for the entity, the attributes of the entity and the analytics associated with the entity. These 3 are grouped as 3 tabs on the top of the panel, for each entity.

../../_images/dataModel-EditEntities.jpg

Note

  • There is an expert mode available using which you can see JSON format of the data model. Click on the button on the top right-hand side Switch to expert mode. There are three sections visible showing entities, attributes and their relations. These 3 are grouped as 3 tabs on the top of the panel, for each entity using three models.

  • Entity-Relationship Model

  • Querybot Entity-Relationship Model

  • Querybot Relational Entity Mapping

../../_images/viewExpertMode.jpg

Entity configurations


Here is a quick description of the various configurations associated with an entity:

Configuration What it does?
Entity Name This is the name used to refer to the entity. If country is the entity name then the corresponding question that can be answered in the guided mode would be: What are the countries? and other variants of the question.
Entity Kind An entity could be of type object or type person. One would typically tag entities that refer to humans (e.g. a customer) as a person entity. An entity tagged as a person will have have the guided questions starting with who instead of what. For instance the question would be Who are the customers with gender female?.
Natural Order Attribute This is used to control the default sort ordering when multiple entities are listed as a result of a question. For instance, if last name of a customer is selected as the natural order attribute then the output of a question like Who are the customers? would be sorted by last name
Primary Key Attributes One or more attributes of the entity that uniquely identify the entity. This is a required selection and each entity instance should be uniquely identifiable using the primary key attributes. If a database schema has primary key already specified, the same would be configured as the default primary key attributes.
Default Temporal Attribute Attributes of the Date or Datetime type can be configured as the default temporal attribute. When configured, this attribute is used for the trend questions and to answer questions like What is the total revenue in the last 3 months?
Synonyms A set of alternative names for the entity. For instance, a customer may also be called a client or a subscriber. This assists both the NLP and guided questions by enabling alternative questions like "Who are the clients with salary greater than 100000? or Who are the customers?, where client and customer refer to the same entity.
Composite Date Attributes There are instances where the database table does not have a date attribute, but instead has day, month and year attributes. This configuration allows one to create a virtual date attribute, and then potentially A set of alternative names for the entity. For instance, a customer may also be called a client or a subscriber. This assists both the NLP and guided questions by enabling alternative questions like "Who are the clients with salary greater than 100000? or Who are the customers?, where client and customer refer to the same entity set of alternative names for the entity. For instance, a customer may also be called a client or a subscriber. This assists both the NLP and guided questions by enabling alternative questions like "Who are the clients with salary greater than 100000? or Who are the customers?, where client and customer refer to the same entity it as the default temporal attribute or use it for enabling temporal analysis.
Named Filters Similar to named aggregates, named filters help the administrators incorporate business terms into the bot vocabulary. Examples of these include terms like Singaporean, French being used to refer to customers from a certain country or high earners, defined as a condition over customer income.
Named Aggregates Business teams often have a well-defined set of terms that they use to refer to certain summations, averages and/or conditions. Example of these include net profit, gross margin or even revenue. Such named aggregates allow typical aggregates like count, sum, minimum, maximum, average, standard deviation and distinct count along with filter conditions.
Referenced Entities If you define reference relations in your QueryBot entity relationship model, this will enable generation of referenced entity filters in queries. Such reference relations are created by default based on foreign key relations when you use the QueryBot tooling to create the initial version of the QueryBot entity relationship model. For instance, say we have transactions referencing customers. This is an M-1 relationship (multiple transactions can have the same customer). In this case, each transaction is referencing a customer. In other words, the customer is the referenced entity and the the transaction is the referencing entity
Derived Attribute Derived attributes are the type of attributes that are made up of other attributes. They do not actually exist in the database schema. They are calculated on the fly and presented as regular attributes.
Variance Analysis (Analytics) In general terms, variance analysis is the quantitative investigation of the difference between actual and planned behavior. This also involves investigation of the factors that contribute to this difference. The goal is to identify various factors responsible for the variance/difference and take action based on this information. Let's take this example for a credit card dataset. We perform variance analysis of credit card usage or sale/revenue for the quarter and with respect to factors like location, merchant, plan, type. If we identify there is a decrease in a particular location or with a particular merchant, business can act. Same is the case with rise.

Canonical synonyms

A set of alternative names for the entity. For instance, a customer may also be called a client. This assists guided questions by enabling alternative questions like “Who are the clients with salary greater than 100000?, where client refer to the same entity.

NLP Synonyms

qbo uses the ability of the NLP model to understand user queries with no prior training, data preparation or human-driven model building. A set of NLP alternative names for the entity are used to enhance natural language queries experience. For instance, a customer may also be called a subscriber. This assists NLP queries enabling alternative questions like “Subscriber with salary greater than 100000? or Subscriber?, where subscriber refer to the Customer entity.

Composite date attributes


Composite means combining attributes and if fields in the database are separate, instead of a single date or datetime field we have day, month, year fields. Distributed databases work like this. So we combine them to get date attribute which are needed for queries where we do temporal analysis.

We have one example in our data set. Refer to the entity entity. We have fields day, month and year which can be combined to create time attribute which will be our composite date.

../../_images/dataModel-CompositeDate.jpg

This will enable us asking questions like:

What are the events that occurred after 2019-04-27?
What are the events that occurred on or after 2019-04-27?
What are the events that occurred before 2019-04-26?
What are the events that occurred on or before 2019-04-26?

We have various filters around composite date. We will see them in the later sections.

Named filters


Named filters are very useful in profile tables. For example, we can define named filters like Singaporean, defined as a customer with nationality Singapore; or woman, defined as a customer with gender Female; or equity, defined as an asset with category equity. Note that the named filter (like all attributes) should be defined in singular form. The plural form is automatically used by the bot as needed.

Similar to named aggregates, one can use named filters to incorporate business terms into the bot vocabulary. Examples of these include terms like from Europe, that are male. This enables questions like Who are the customers from Europe and Who are the customers for Europe that are male.

A number of filter types are available when configuring the named filter which includes following:


String filter This is used to create filters for filtering out certain value of an attribute and where attribute should be of type String. Let’s see one example of String filter of customer entity.

Related Videos

Create a named filter in qbo

../../_images/dataModel-ConfiguringNamedFilter-StringFilter.jpg ../../_images/dataModel-ConfiguringNamedFilter-Customers.jpg

Now you can ask questions like Who are the customers from Korea ?

Comparison filter This is used to make a comparison while filtering out certain values of an attribute. The threshold part of the comparison filter depends on the type of the attribute. In this example, it is of type boolean, hence the Boolean threshold is set. The other possible values are Date threshold, Date time threshold, Double threshold and Integer threshold. The operators can be Equal, Greater than, Greater than or equal, Less than, Less than or equal or Not equal.

../../_images/dataModel-ConfiguringNamedFilter-CompositeDateComparisonFilter.jpg

Complex filter

This filter is for combining different filters with a connective. The connective can be And, Or. The example filter selects the ids between 1-10, inclusive.

Composite date comparison filter

We can employ composite date comparison filters on composite date attributes, whose day, month and year attributes are mapped to different columns of the corresponding table.

Composite date component filter

The date and datetime values have more than one component. We can have individual filters on any of these. For example, to get the data which have April as their month, we can use the filter as shown in the example. The time units can be either of the Day, Hour, Minute, Month, Quarter, Week or Year.

Containment filter

For attributes of string type, we can have a list of values as a filter. This way, we can filter out the data which have a value other than the ones in that list. This is called a containment filter. The example filter only selects the data with a continent being Africa, Asia or Europe.

Composite date range filter

Filters that include composite means our fields in the database are separate; instead of a single date or datetime field we have day, month, year fields. Distributed databases work like this. So a different filter is needed for them. So, composite date range filter means we have a date range with start and end and this is shown on three different attributes of the entity as you can see below:

../../_images/dataModel-ConfiguringNamedFilter-CompositeDateRangeFilter.jpg

Composite date window filter

Window filters are rolling windows, not absolute date information is included, queries like last month, last 3 weeks are window filters. If this also done on composite date entities it will be composite date window filter, en example:

../../_images/dataModel-ConfiguringNamedFilter-CompositeDateWindow.jpg

Date component filter

Date Component filter is the filter we use when we query with a component (DayOfWeek, MonthOfYear, Year) So when we want to query the entity that happened on a monday, or in a January, we use Date Component Filter. This will enable us to ask queries like Products that listed on a Monday.

Existential filter

Existential filter filters the data by whether the specified attribute value is set to NULL or not. When inverted checkbox is checked, only the data where the specified attribute is set to NULL will be used. Likewise, when inverted checkbox is not checked, only the data where the specified attribute is set to some value other than NULL will be used. What are the products with a listing date? is a query you can ask about products with listing date attribute is set to some value other than NULL and likewise What are the products without a listing date? is a query you can ask about products with listing date attribute is set to NULL.

../../_images/dataModel-ConfiguringNamedFilter-ExistentialEntity.jpg

Referenced entity filter

Referenced entity filter filters the data of the current entity using a filter that runs on the entity that it references. The idea is clearer in an example. Assume we have two entities customer and country and, assume that the customer entity references the country entity. To ask about customers from Europe we can ask Who are the customers from countries with continent “Europe”?. with continent “Europe” part is a filter that runs on the country entity and we can use it to filter the data of the customer entity.

Referencing entity existential filter

Referencing entity existential filters are similar to referenced entity filters, but with the places of the entities switched. Similar to the example for the referenced entity filter, we can also ask about country entity using a filter that runs on the customer entity. To ask about countries that have at least one male customer we can ask What are the countries that have customers with gender “male”?. with gender “male” part is a filter that runs on the customer entity and we can use it to filter the data of the country entity. On the other hand, to ask about countries that have no male customers we can ask What are the countries that have no customers with gender “male”?. In order to do that we also check to inverted checkbox.

Referencing entity group filter

Referencing group filter filters data of the current entity using a filter that runs on an aggregated attribute of the entity that references it. We can have grouping filters on the entities which reference other entities, so that we can get the queried referencing entities For example: Who are the customers with the total amount of transactions that occurred in the last 20 days greater than 150? Here, we know that each transaction is associated with a customer, so in this case, the transaction is the referencing entity and the customer is the referenced one. There will be a group filter on the transaction amount and there will also be a simple filter regarding time attribute of transaction. You can ask questions like: What are the countries with the total number of resident customer greater than 100? We want to filter country entity here by using a filter that filters the customer entity but the filter is applied to an aggregated attribute which is the count of the rows.

Temporal range filter

This is analogous to the Composite date range filter but these run on the datetime field already present in your database.

Temporal window filter

This is analogous to the Composite date window filter but these run on the datetime field already present in your database.

Named aggregates


It is basically extending aggregated attribute with a variant filter and gaining a phrase. As mentioned before, business teams often have a well-defined set of terms (e.g. net profit, gross AUM) that they use to refer to certain summations, averages under certain filter conditions. One can not only configure such aggregates via the UI, but also setup synonyms for the same. Specifically, one needs to provide the phrase that would be used to refer to the aggregate, the attribute that will be aggregated (not required when aggregate function is count), the aggregate function that could be one of sum, min, max, count, distinct count, average, standard deviation, variance, population standard deviation and population variance, and the entity name usage criteria that could be either required or prohibited. The entity name usage, when set to required results in support for questions like What is the average customer spending versus questions like What is the canceled amount of the transactions when the entity name usage is required. In the aforementioned examples, average customer spending and canceled amount are named aggregates. One can also configure NLP synonyms for such named aggregates to include additional terms that could be used to access the names aggregate.

Related Videos

Create a named aggregate in qbo

../../_images/dataModel-ConfiguringNamedAggregates-Attribute.jpg ../../_images/dataModel-ConfiguringNamedAggregates-Functions.jpg

There are two other optional aspects to configuring a named aggregate, namely the normalizer configuration and the filter configuration. The normalizer configuration is used when one wants to normalize the determined aggregate using another aggregate computed using the same data. For instance, one as shown in the figure below for entity transaction, one can compute average customer spending by dividing the sum of spending by the distinct number of customers in the data.

../../_images/dataModel-ConfiguringNamedAggregates-Normalizer.jpg ../../_images/dataModel-ConfiguringNamedAggregates-Filter.jpg

The filter configuration is used to limit the data that is used to compute the aggregate. For instance, to calculate the amount associated with canceled transactions, one can setup a filter (canceled = true)to determine the canceled transactions. There are several other kinds of complex filter criteria that can be configured and more about the different kinds of filters will be discussed in the later section.

Named aggregates also have Display Configuration option. Following conditions should be met to get that option enabled.

We need the aggregation function, aggregated attribute type (may be null) and normalizer for a proper check. Hence, this may affect the changes on the normalizer as well.

  1. If the attribute type is INTEGER and the aggregation function is one of the [average, variance, std. deviation, population variance, population std. deviation], then the attribute type is set to be DOUBLE, but this can change if there is a normalizer and it has a normalized attribute type. So we cannot directly return for now.

  2. If the aggregation function is one of the [count, distinct count], then the attribute type is set to be INTEGER, but this can change if there is a normalizer and it has a normalized attribute type. So we cannot directly return for now.

  3. 1 and 2 are mutually exclusive.

  4. If there is no normalizer, we can return the attribute type we have found on 1 or 2.

  5. So we have a normalizer. If it has its own normalized attribute type, it is our eventual type. Hence we return that.

  6. If the attribute type is INTEGER up until this point, we make it to DOUBLE, because normalization is a division operation.

  7. We are done.

You can also use multiple named aggregates with filters in the same query. For instance, let’s assume that the ‘net transaction amount’ and the ‘cancelled transaction amount’ are two named aggregates with potentially different filters. You can ask:

What is the net transaction amount and canceled transaction amount?

The ability to refine aggregation queries to add/replace named aggregates with implicit entity names is now possible. For instance:

What is the total number of the transactions?
Refine and also show the net spending amount
What is the average customer spending and canceled amount of the transactions?
What is the average spending of customer ?
Refine and group monthly
Refine and also show the net amount

Referencing named aggregate

A referencing named aggregate is a calculated attribute. It references an existing named aggregate in a referencing entity. An attribute of this kind is calculated by grouping the data in the referencing entity by the current entity and running the aforementioned named aggregate for each of these groups.

For example, We have a named aggregate named net spending amount in the transaction entity. This named aggregate sums the amounts that are not cancelled in the transaction entity. We declare an attribute named spend in the customer entity. We declare that this attribute references the net spending amount named aggregate in the transaction entity. Values of this attribute are calculated by grouping the transaction entity by the customer and aggregating net spending amount for each of these groups. The result gives the total amount that is spent by each customer. We may choose to display this attribute along with the other attributes of the customer entity.

../../_images/dataModel-ReferencingNamedAggregate.jpg

Referenced entities


Reference is a relationship between entities in which one entity designates, or acts as a means by which to connect to or link to, another entity. The first entity in this relation is said to refer to the second entity.

Reference tables are a special type of profile tables. They are most useful when they are joined with profile, transaction or aggregate tables. For example, there might be a location reference table that gives the latitude and longitude of each city mentioned in another table. Or a product reference table that gives details about products mentioned in sales transactions or inventory table.

Some best practices for dealing with

  • Reference tables usually have a primary key that is used for joining with the other tables. For example, this could be a city name or a product id.

  • Reference tables can be specified as reference relations for other entities.

  • It is often useful to add attributes in the reference tables as attributes in other tables to simplify the query construction.

For this we need to add new entities. You can refer to the next few sections on how we can add or delete entities. Let’s see how these can be created using user interface.

Follow below steps to configure Referenced Entities:

  • Select Entity from the added list of the entities on the left side for which you want to configure Referenced Entities.

  • Scroll down to the Referenced entities settings.

  • Click on Add reference relation settings.

  • Select Referenced Entity from the drop down.

  • Select attribute from the drop down and provide phrases for referenced and referencing entities.

Let’s see a few examples of Referenced Entities.

  • Look at the schema of the CUSTOMER and COUNTRY entities. Entity COUNTRY has an attribute name and entity CUSTOMER has attributes country_of_birth and country that can be referenced. Here we will update COUNTRY as referenced entity in CUSTOMER entity.

../../_images/dataModel-ConfiguringReferencedEntities-Customer.jpg ../../_images/dataModel-ConfiguringReferencedEntities-CountryCustomer.jpg

Another example that we see coming from entities TRANSACTION and PRODUCT. Entity TRANSACTION has an attribute product_code and entity PRODUCT has an attribute code that can be referenced. Here we will update PRODUCT as referenced entity in TRANSACTION entity.
../../_images/dataModel-ConfiguringReferencedEntities-Transaction.jpg ../../_images/dataModel-ConfiguringReferencedEntities-ProductTransaction.jpg



After making all the changes first save draft and then activate draft to reflect your changes.

Later you can ask questions like:

Who are the customers born in country Korea?
Who are the customers in countries with population greater than 1000?
Transactions have product description hat?
Products making transactions with quantity greater than 9?

Multiple level nesting is also supported. For instance:

What are the transactions from customers in countries with population greater than 1000?

Note that these go from transactions to customers to countries.

Deleting an entity


There might be some scenarios where we need to remove the entity added during the setup which is not needed anymore. Then we have an option to delete an entity. Follow the below steps to delete an entity:

  • Select database from the added Data Sources

  • Click on the Edit button on the top right corner.

  • Click on the Delete icon to delete an entity. For example CUSTOMER.

../../_images/dataModel-DeleteEntity.jpg
  • After deleting click Save draft and then Activate draft.

Adding a new entity


There might be some scenarios where we need to add more entities from the database which were not added during setup. Then we have an option to add an entity. Follow the below steps to add an entity:

  • Select database from the added Data Sources

  • Click on the Edit button on the top right corner.

  • Click on the New entity icon to add an entity. For example PURCHASE.

  • Select table name from the drop down and provide entity name.

  • After adding click Save draft and then Activate draft.

../../_images/dataModel-AddEntity.jpg

Attributes of an entity


An entity can have 1 or more attributes. Such attributes can come from the primary table, or can come from other tables by means of specifying the appropriate relationship between the primary table and the table in question. When adding attributes from a table other than the primary table, it is important that the relationship conditions result in exactly 1 row being fetched from the target table. In a typical database schema definition, this is the case when there is a foreign key in the primary table that maps to the primary key in the target table.

Attribute with a path of foreign key relation

This type of attribute does not come from a column of the primary table. Instead, it comes from a column of a table related to the primary table. We can declare this type of attribute only if there is a foreign key relation path between the two tables, and all of these relations is either one-to-one or many-to-one.

For example:

We declare the country population as one of the attributes of the customer entity. We declare that the population attribute comes from the population column of the country table. Then we declare the foreign key relation path for it, which is essentially the table join conditions in order to get the desired column. The foreign key relation path for this attribute would be: [(customer.country -> country.name)]. We can do this because for each customer there is only one country.

We can also chain foreign key relation paths. For example, We declare country population as one of the attributes of the transaction entity. We declare that the population attribute comes from the population column of the country table. Then we declare the foreign key relation path for it, which is essentially the table join conditions in order to get the desired column. The foreign key relation path for this attribute would be: [(transaction.customer_id -> customer.id), (customer.country -> country.name)]. We can do this because for each transaction there is only one customer, and for each customer, there is only one country.

../../_images/dataModel-PathOfForeignKeyRelation.jpg

Note

There might be some cases where we have a foreign key relation but we do not want it to be validated then Referenced columns key validation disabled can be checked.

../../_images/dataModel-ForeignAttribute.jpg

For a selected entity, the attributes tab lists all the attributes associated with the entity and provides an easy interface for editing various configurations associated with an attribute.

../../_images/dataModel-Attributes.jpg

Related Videos

Logical data model and entities

The concept of primary key and value substitute

Path of foreign key


One can edit the name of the attribute and it does not need to be column name as retrieved from the database. This is achieved by clicking on the name of the attribute and the same becomes editable. The source column and the table name, primary or otherwise, from the data source is displayed under the column heading.

One can change the Data Type of the attribute as long as the same is compatible with the type in the backing data source. For instance, an Integer attribute from the backing data source can be mapped to a String, etc. The interface throws an error when in incompatible types are detected.

An entity may have several attributes, and if one wants to limit the attributes shown by default, in response to a query like Who are the customers or What are the transactions, then one can unset the Visible toggle for the attributes that need to be hidden. Please note that the user can still explicitly request for such attributes using a query like What is the age of the customers - and age will show up, even if the attribute is marked as not visible.

There are several other configurations that are enabled for an attribute, and the same are accessible by clicking on the chevron on the right of each attribute listing.

Value Substitutes

In the expanded configuration interface for an attribute, for attributes with String data type one can add Value Substitutes. This can, for instance, be used to map 0 to Female and 1 to Male - even if the gender column is of Integer type in the database. From the business perspective one can ask a query like Who are the customers with gender Female even when the database has 0 and 1 in the gender column.

../../_images/dataModel-AttributeValueSubstitutionTypeString.jpg

Grammar generation configuration


The grammar generation configuration is supported for Boolean, Numerical (i.e. Integer, Decimal, Double, Percent and Currency), String and Temporal (i.e. Date and ``Datetime) attributes. The same is detailed below for each data type.

Boolean attributes support positive and negative phrases. These phrases help us to create the same query in different forms. For example, the following query is the one without any configuration - What are the transactions that are canceled? In this query, canceled is a Boolean attribute. Now, if one would like to have the wording state in the query then for this attribute, we can add the configuration which has not in canceled state for the negative phrase and in canceled state for the positive phrase. Now the same query can be asked with the phrases we have provided in the configuration: What are the transactions in canceled state?

../../_images/dataModel-AttributeBooleanGrammarConfiguration.jpg

For attributes with Numeric data type a number of grammar generation configurations are available. The Between phrase can be used to configure the phrase used to find a number between two values. The Equality phrase, like priced exactly, as shown in the screenshot below adds a more human touch to the question phrasing. This configuration permits us to ask questions like What are the products priced exactly 10.0? instead of the default What are the products with price 10.0?. Other configurations like the Greater than or equal phrase, Greater than phrase, Inequality phrase, Less than or equal phrase and Less than phrase help refine the phrasing further.

../../_images/dataModel-AttributeNumericGrammarConfiguration.jpg

The Attribute type for aggregations is almost always summable for the numeric attributes, which means that the attribute can be summed. For some cases, like when an id is an integer or a product code is an integer, we may want to limit it to be only countable or comparable.

../../_images/dataModel-AttributeTypeForAggregations.jpg

For attributes with String data type, similar to the grammar generation configurations for the Numeric attribute types, a number of configuration phrases are available that help us humanize the queries to be asked. String attributes also allow an additional configuration that enables custom sort order. This custom sort order is useful for scenarios where the alphabetical sort of the String attribute is counter-intuitive. For instance, consider a segment attribute with values High Value, Mid Value and Low Value. These will get sorted as High Value, Low Value and Mid Value in ascending order and business user may not like the fact that Low Value appears in between High Value and Mid Value. Custom sort configuration alleviates this problem.

../../_images/dataModel-AttributeCustomSortOrder.jpg

For Temporal attributes like the Date and the Datetime attributes allow the configuration of a Connecting phrase. For instance, the attribute configuration below allows one to ask a query like What are the products that listed after 2019-04-27? instead of the default question What are the products with listing date after 2019-04-27?.

../../_images/dataModel-AttributeTemporalGrammarConfiguration.jpg

Display configuration


For Numeric attribute types other than Currency, one can set Number format, Number postfix and Number prefix configurations. This way, we can prepend a tag to the number or append a tag to the same number as the labels in the visualizations.

The configuration above makes for example 51.57M label to be displayed as counting 51.47M people.

../../_images/dataModel-AttributeNumericDisplayConfiguration.jpg

For Currency one can additionally set the currency code, so that each value for that attribute has that currency code added to the labels in the visualizations.

The configuration above makes for example 51.57M label to be displayed as making 51.47M USD profit.

../../_images/dataModel-AttributeCurrencyDisplayConfiguration.jpg

Synonyms


One can also setup synonyms for referring to an attribute. For instance, one can setup pay as a synonym for salary, and as a result questions like What is the salary of the customers and What is the pay of the customers will return the same response.

../../_images/dataModel-AttributeSynonyms.jpg

Derived attribute


Derived attributes are the type of attributes that are made up of other attributes. They do not actually exist in the database schema. They are calculated on the fly and presented as regular attributes. They are distinguished from the regular attributes by the derivedAttributeExpression field. If this field is filled, then we assume the attribute is a derived attribute. Another distinguishing point for derived attributes is that they do not have an attribute mapping defined for them. This is because they are calculated attributes and not physical ones. This also means they do not have value substitutes or path of foreign keys defined for them.

The expression for the derived attribute is a string written in the Unscrambl Chai Expression Language (UEL). We try to parse this expression and generate the SQL that is equivalent to it. With UEL we are able to support binary arithmetic operations, ternary operations, value literals as of now, and support for the remaining operations or functions is on the way.

Examples of derived attributes with expressions:

  • an attribute named profit can be defined as revenue - cost

  • an attribute named unit price can be defined as amount / quantity

  • an attribute named price over 100 or 0 can be defined as price > 100 ? price : 0

  • an attribute named quantity doubled can be defined as 2 * quantity

Let’s see an example in qbo. Where when new attributes are derived using existing attributes.

../../_images/dataModel-DerivedAttributeConfigure.jpg

Related Videos

The concept of derived attribute in qbo


Analytics that can be performed on an Entity


First, let’s define a few basic concepts related to analytics:

Metric: Any measurable quantitative value in the dataset like total count, average revenue. Metrics are usually computed by aggregating values from one or more attributes.

Dimension: Any attribute that can be used for grouping the entity. Usually, these are string attributes like city, team etc.

Variance queries allow comparing the values of a metric (also called behavioral aspect) between two different values of a dimension (also called a related aspect).

Qbo performs variance analysis on the metric by:

  1. Analyzing the key reasons for the differences. qbo digs into different dimensions (or aspects) that are related to the metric to identify the most significant values that explain the difference between the two values of the metric.

  2. Ranking the values of each dimension based on the absolute change, percentage change, and significance of the change in comparison to other values in that dimension. qbo presents the top k values across all available dimensions along with additional information about other related metrics.

There are two common types of variance queries:

Comparison based on a time attribute

This is the most common kind of variance query, and is commonly used to analyze why some metric has increased or decreased in a given time period compared to some baseline. There are two different ways of defining the baseline:

Comparison of the value of a metric between two different periods of time. Note that it is possible for the two different periods to be of different lengths of time – in this case, qbo just warns about this and proceeds with the analysis. Examples are: Date attribute name

Compare the spent amount in the last 3 days with the previous 3 days
Compare the spent amount in the current week with the previous 3 days
Compare the spent amount in the last 3 weeks including this week with the previous 3 weeks
Compare the spent amount in the last 2 months including this month with the previous 3 weeks
Compare the spent amount in the current week with that in between 2016-08-01 and 2016-08-02

Comparison of the value of a metric on a given time period with an average. In this case, the average is computed on the full dataset, and it can be a daily, weekly, monthly, quarterly or yearly average depending on how the time period is defined. Examples are : Compare the spent amount on 2016-07-18 with the average In this case, we compare the spent amount on a given day with the daily average over the whole dataset. Phrase the average is interpreted as daily average by default. Compare the spent amount in the last 2 quarters including this quarter with the weekly average

In this case, we compare the weekly average of the spent amount in the last 2 quarters including the current quarter so far, with the weekly average over the whole dataset. The Average period specified for the planned period is used for both time periods.

Comparison based on a non-time attribute This allows us to compare the value of a metric between different products, customers, regions or other non-time dimensions, and analyze the key factors behind the difference. Examples are :

Compare the spent amount of customer Maryjane Doyle with that of customer Tianna Stanley
Compare the spent amount of customer "Maryjane Doyle" with that of customer Tianna Stanley

A time period filter can be applied to these queries. Eg:

Compare the spent amount of customer Maryjane Doyle with that of customer Tianna Stanley in the last 3 weeks
Compare the spent amount of customer Maryjane Doyle with that of customer Tianna Stanley on or after 2020-04-10

Let’s take an example to understand how this can be configured in qbo:

  • Go to the Analytics section of the entity for which you do Variance Analysis.

  • If Date attribute name is provided time based comparison will be enabled.

  • If Only categorical aspects button is checked, you cannot set an INTEGER attribute as aspect (only STRING or LOCATION_NAME), if it is unchecked they can be STRING, LOCATION_NAME and INTEGER type.

  • Configure Aspects by selecting attributes from the dropdown and giving them a name.

  • Configure Behavioral aspects and map to the Aspects.

../../_images/dataModel-AnanlyticsConfiguration.jpg

Here we can see that for entity transaction, metrics (Aspects) number of transactions and total amount can be compared with respect to dimensions (Behavioral aspects) product code and customers

Here you can see that user John is able to compare the total amount of transactions with respect to time and product code.

../../_images/dataModel-AnalyticsQuestion.jpg

Note

  • Named aggregates can be used as variance behavioral aspects. At this time, there are some restrictions on the named aggregates that can be used as variance behavioral aspects.

  • Their aggregation function is sum or count.

  • All the attributes having the attribute type string, integer, and location name can be regarded as related aspects.

  • For manually added behavioral aspects, the querybot validates that the names of the behavioral aspects are the same as a named aggregate phrase of the same entity. If a named aggregate phrase is found that is equal to a behavioral aspect name, the querybot validates that the behavioral aspect has the same attribute name, the same aggregation type(behavior), and the same normalizer of that named aggregate.

Save, Deploy & Reload


As any process exhibits various behaviors and goes to various stages before we finalize the blueprint or the outline like building, testing and deploying similarly at various points while you are editing and making changes to the entities, attributes and their configurations in the Data Source Configuration, you can decide to either keep them or reject them. And even if you are keeping the modifications then you must make them reflect in the conversations.

To attain this lets see the whole behavior or the cycle of managing the revisions or modifications being done. This process involves following steps:

Edit Draft –> Save Draft or Discard Changes –> Activate Draft or Discard Draft –> Reload

../../_images/deployCycle.jpg

Let’s see steps below how to do this:

  • Click on the Eye icon to view the data source you want to edit.

../../_images/deployCycle-Reload.jpg
  • Click on the Edit button to start editing configurations.

  • After making all the necessary changes, always remember to click on the Save draft button to save your changes.

../../_images/deployCycle-SaveDraft.jpg
  • You will see a message Draft added successfully if all the changes are saved successfully.

  • If you want to go back to the data sources click on the Back button.

  • Now you will see an added icon of the draft you were working on in the data sources and you can click on the Edit icon to make further more changes.

../../_images/deployCycle-Drafts.jpg
  • After all the changes made you can again click the Save draft button followed by the Activate draft button.

../../_images/deployCycle-SaveDraft.jpg
  • You will see a message Draft activated successfully if the draft activated successfully.

  • At last you need to reload your data sources.

../../_images/deployCycle-Reload.jpg
  • You will see a message Query bot server reloaded successfully if the draft reloaded successfully.

  • Similarly we can Discard changes and Discard draft if needed.

Data access roles


We can define roles to restrict access to data sources means restricting entity access or attribute access for a set of users. That means data source based roles. In other words, a role allows a user to use a particular functionality within the solution so that user can perform a task or set of tasks. There could be a scenario where we need to restrict some level of data access from the developer where we might need to hide some critical information from the developer or for an instance we need to reveal a particular set of data to the analyst. There comes Data Access Roles in the picture.

Data access role configuration is accessible under the Data Sources link, under the DATA ACCESS ROLES tab. Data access roles are used to restrict the access of a user to certain entities. Such access restrictions can be further extended to the attributes of the entities, and even further, to limit access based on the value of the attributes.

../../_images/dataAccessRoles-Roles.jpg

For instance, one can create an Developer role that has access to customer and product entities only.

../../_images/dataAccessRoles-CreateNewRole.jpg

Let’s give this role Developer access to two entities that is customer and product.

../../_images/dataAccessRoles-AssignPermissionDeveloper.jpg

Now, one can further restrict the access of the Developer role and make the attribute gender not accessible to this role.

../../_images/dataAccessRoles-RestrictAttribute.jpg

Finally, one can restrict the access of the Developer role to product entities with code equal to US-HEALTH-01.08.

../../_images/dataAccessRoles-RestrictAttributeValue.jpg

Hovering on the role name will give you an icon to delete the role.

../../_images/dataAccessRoles-DeleteRole.jpg

Developer role has been defined with all the restrictions applied. Now we will assign one user under this role and see what is accessible to the user. For any user let’s suppose John here update the Data access role in the User Access Management from Access All to Developer.

../../_images/dataAccessRoles-SelectRoleOfUser.jpg

Now click on the Save Changes button to see the updated role of the user.

../../_images/dataAccessRoles-UpdateRoleOfUser.jpg

Now sign-in for user John and test for queries for entities.

../../_images/dataAccessRoles-Login.jpg

Here you can see that user John is not authorized to ask questions on event entity.

../../_images/dataAccessRoles-TestingQueries.jpg

So that’s a very beneficial part where depending on the data, the use case and criticality of the data restrictions can be made.