Data Source Overview
A data source represents access via JDBC to a back-end database server. Each VDB definition must have at least one default data source associated with it, which will be used if no policy dictates what database should receive a query. Source configuration includes the connection properties, connection pooling, and Load balancing and cluster management characteristics.
The driver field allows either a new or existing JDBC driver to be specified. Heimdall comes with a small library of JDBC drivers that it has been tested with, but other drivers can be configured in the driver tab as needed. When using the proxy functionality of a VDB, the tested drivers should in general be used, as some internal and version specific features of a driver are often used by the proxy to provide compatibility with the binary layer translation that occurs.
The JDBC URL configuration will normally be build through the configuration wizard, or can be copied from an existing JDBC application. Properties on the JDBC URL can be moved to the name-value list in the connection properties, OR can remain in the JDBC URL–either works, although using the name-value list provides an easier to view list vs. a long URL. When internally doing database type detection, the JDBC url is also used to determine what type of database is in place, for per-database behavior adjustments. When using load balancing, this URL is not in general used, with one major exception--when using the hdUsePgNotify option, this will be the target used for notifications. This is necessary as Postgres notifications will not be shared between nodes, and as such this should point to a single name that resolves to a single target database.
The data source username and password are used for several purposes:
- For health checking the data source;
- When performing lag detection, to write to and read from the Heimdall schema;
- When doing cluster detection, reading cluster variables as appropriate, or when Heimdall needs to extract metadata for other purposes;
- When no user or password is specified in JDBC mode, this user will be used to connect to the database;
- In proxy mode, if proxy authentication is not provided. In this case, any user can connect, but the DB connection will be made with these credentials.
In some cases, it is necessary to specify various parameters in a dynamic way, i.e. with the Odoo application, it changes the active database at runtime, and connects to the desired database as desired. To support this, fill in the following values as necessary:
- ${catalog} or ${database} to specify the name of the database (either will work)
- ${schema} for the specified schema
- ${user} for the connection user
- ${password} for the password specified on connect (only for Postgres and SQL Server)
The test query option is used in a variety of situations, including health checks, and verifying that a connection is still viable after an exception. It is also used in a variety of conditions when pooling is configured, in order to validate the connection while idle, etc.
To validate that the options appear valid, please use the Test Connection button. This will initiate a connection through the vendor driver by the Heimdall management server, and acts as a general guide if the configuration appears valid. It is not a guarantee that under all conditions it is valid, however.
To automatically execute a set of SQL scripts on the data source, use the Execute scripts button. This will attempt to execute selected scripts located in config/scripts in the Heimdall installation directory. Scripts are grouped by database type into directories. You can modify existing ones or add your own to these directories to be able to execute them via GUI. Predefined files are put into place on Heimdall startup, these use the "heimdall" schema by default, and you can modify them, but if you choose to use a different schema, it's existence must be ensured.
Connection properties
Connection properties are database and driver specific. When in proxy mode, a default set of properties will be inherited by the source in order to provide compatibility with the proxy mode used. In JDBC cases, no default properties will be set by Heimdall. When using a known driver, all supported connection properties will be provided in a drop-down list, and tool-tip help provided for each one when selected. When using an unknown driver, the property name field will allow an arbitrary name to be specified and value provided, as per your JDBC driver's documentation.
In order to add a custom property (in the event it isn't in the drop-down list), you can use the custom property checkbox next to the blue plus, to provide a free-form field.
Special properties:
- dbTriggerTracking, dbTimeMs and dbChanges: Please see the Cache Theory section.
- initSQLAsync: Please see the Async Theory section.
- hdConnectRetryCount: Specifies the number of times a connection attempt will be made, defaults to 3.
- hdConnectRetryDelay: The length of time between retry attempts. This value doubles on every retry for a given connection. Defaults to 0 (ms).
- hdConnectTimeout: How long each attempt to connect will be made in ms. This option sets database driver specific values based on what is supported. Defaults to 1000(ms).
- hdUsePgNotify: Instead of using a cache's pub/sub interface, when using Postgres, use the postgres notify interface for cache invalidation and read/write split last write time notifications. Please note, this still requires a cache to be enabled, although it can be the "local only" cache option.
- hdNotifyDBs: When using hdUsePgNotify, by default only the default catalog is listened to. This option allows one or more catalogs to be listened to, for trigger based invalidation. A value of * means "all catalogs", with comma separated values starting with "-" removing from the complete list. Example: *,-template0,-template1 would be a generally good starting point.
- azureDbHost: Please see the [Azure] (../environment/azure.md) documentation for more details.
- dbTriggerTracking, dbChanges, dbTimeMs: Options to enable query based table change tracking, leveraging triggers. See the caching page for more details.
Please see the UI for a more complete list, including vendor options, as the drop-down list provides context sensitive help on each option provided.
Connection Pooling
The pooling section controls if connection pooling (the reuse of back-end connections for many front-side connections) is used internally to Heimdall, and if so, what the connection pooling properties are. Heimdall implements a variant of the Tomcat connection pool, and where appropriate uses the same properties for configuration.
The primary difference between the Heimdall and Tomcat connection pool is that the Heimdall pool implements it as a "pool of pools" where each user/database (catalog) combination becomes a pool within the connection pool, but overall resources are constrained by the overall maxActive option, which dictates the overall number of busy AND idle connections that can be established to the data source. Additional properties of maxUserActive and maxUserIdle dictate how many active (busy+idle) and idle connections are allowed on a per-catalog+user combination. This provides much more granular control over the behavior of the connection pool when many databases and users are connecting to the database server.
If the database has configured a type of idleTimeout (currently supported: idleSessionTimeout for postgres(version 14+), wait_timeout for mysql) if the value is detected sessions in idle pool will be deleted with an increasing chance in time based on idleTimeout, it is done to prevent having closed connections in the idle pool. If connections are not deleted fast enough consider changing the timeBetweenEvictionRunsMillis pool property.
The properties section provides a drop-down list of the available properties, and tool-tip help for them. The available properties are:
Important queries--in general these should be set in all cases:
- maxAge: The chances of a connection being closed when being returned to the pool (1 out of X) or during an eviction run if idle. Old behavior was the Time in milliseconds to expire the connection on return or idle (Long, 20). connections less than the timeBetweenEvictionRunsMillis will not be culled at random, so if there is activity on the connection at least that often, it will remain connected.
- maxActive: The maximum number of open connections (Integer, no default, however an attempt is made to detect the value from the server). This is equivalent to the connection limit in PGBouncer's max_client_conn option. If not set, the proxy will attempt to determine this from the database itself, but if more than proxy is active, this should be manually set to account for the maximum number of proxies, as it is applied on a per-proxy basis.
- maxUserActive: The maximum number of open connections for a particular user, can be overridden with a rule (Integer, 0)
- maxUserIdle: The max number of idle connections for a particular user, can be overridden with a rule (Integer, 0)
- maxWait: The time to wait in milliseconds before a SQLException is thrown when a connection is requested, but can't be honored due to the number of connections (Integer, 30000)
- resetQuery: The query to run after a connection is returned to the pool (String, no default)
- testOnReturn: True if validation happens when a connection is returned, implied with a reset query (Boolean, false)
- validationQuery: The query to run during validation (String, select 1)
- rateLimit: Total connections/ms that can be opened, (double, 2)
- maxBurst: Number of ms of burst that can be allowed before clamping occurs (double, 50)
Other properties:
- abandonWhenPercentageFull: Connections that have been abandoned isn't closed unless connections in use are above this percentage (Integer)
- commitOnReturn: If autoCommit!=null and autocommit is not set then the pool can complete the transaction by calling commit on the connection as it is returned to the pool (Boolean, false)
- rollbackOnReturn: If autoCommit!=null and autocommit is not set then the pool can terminate the transaction by calling rollback on the connection as it is returned to the pool (Boolean, true)
- connectionProperties: The connection properties that will be set for new connections. Format of the string will be [propertyName=property;]* (String)
- defaultAutoCommit: The default auto-commit state of connections created by this pool. If not set, default is JDBC driver default (If set to "null" then the setAutoCommit method will not be called.) (boolean, true). When the connection is returned to the pool, this state will be reset if necessary.
- defaultCatalog: (String) The default catalog of connections created by this pool.
- defaultReadOnly: (boolean) The default read-only state of connections created by this pool. If not set then the setReadOnly method will not be called. (Some drivers don't support read only mode, ex: Informix)
- defaultTransactionIsolation: (String) The default TransactionIsolation state of connections created by this pool. One of the following: NONE,READ_COMMITTED,READ_UNCOMMITTED,REPEATABLE_READ,SERIALIZABLE. If not set, the method will not be called and it defaults to the JDBC driver.
- initSQL: A SQL executed once per connection, when it is established (String)
- logAbandoned: If true, stack trace will be recorded and printed out for timed out connection (Boolean)
- logValidationErrors: Log errors during the validation phase to the log file (Boolean, false)
- minEvictableIdleTimeMillis: Minimum amount of time a connection stays idle before it is evicted (Integer, 60000)
- removeAbandoned: True if connection in use can be timed out (Boolean)
- removeAbandonedTimeout: Timeout in seconds for connections in use (Integer, 60)
- suspectTimeout: Timeout in seconds for connection that suspected to have been abandoned (Integer)
- testOnBorrow: True if validation happens when a connection is requested (Boolean)
- testOnConnect: Validate connection after connection has been established (Boolean)
- testWhileIdle: True if validation happens when a connection is not in use (idle) (Boolean)
- timeBetweenEvictionRunsMillis: Sleep time for background thread in between pool checks (Integer, 5000)
- validationInterval: If larger than zero than validation will only occur after the interval milliseconds has passed (Long, 3000)
- validationQueryTimeout: The timeout in seconds before a connection validation/reset queries fail (Integer)
For additional information about each pool parameter, please see the Tomcat pool attributes page (https://tomcat.apache.org/tomcat-9.0-doc/jdbc-pool.html#Attributes).
Warning: Not all applications will be compatible with connection pooling. If issues are observed in initial application testing, it is recommended that disabling connection pooling be performed in order to isolate if this feature is triggering undesirable behaviors in the application. If using the connection pooling, the default number of connections is set to 1000. To adjust this, change the “maxActive” setting to the desired number of connections.
Connection Pooling options
There are commands specific for connection pooling such as show pools and clear pool described in the Inline commands section.
Postgres only: There is an option to clear a pool when making a connection to it is impossible due to the pool already being at maximum capacity. Please see Clear pool option.
Load balancing & High Availability
Due to the complexity of this topic and how the options interact with other subsystems, please see the help section dedicated to Load Balancing for more details.
Group Mapping
The Group mapping feature provides a flexible approach to associate a group (e.g. LDAP) with specific database roles. In the absence of group mapping, Heimdall defaults to mapping groups to roles with identical names. This option applies when the heimdall sync_user script is used with ldap authentication. Group mapping will also be triggered when user session is created via portal. Users can configure group extraction based on the group regex when the "Use Regex" checkbox is enabled, if checkbox is disabled, noone of the regexes will be applied to the mapping. Heimdall supports the use of capture groups within regex patterns. This enables the extraction of specific substrings from group names for role mapping. For example, if configured as on the screen, the 'heimdall-admin' group will be mapped to the role '${1}' which leads to 'admin' substring. When no mapping is configured, this option can be used as "deleting" group.
Roles Management
First part of this section allows to make all session requests justification for current data source require to match given pattern. Toggling 'Require justification pattern' will override any global setting under Admin/Portal. Leaving 'regex pattern' empty can be used to disable the global setting. Invalid justification message should be provided to inform the requesting user how should justification look like.
Next part of this section is used to manage roles in the database. In the configuration for a single entry with a role, you can specify a notification alias associated with an email group. This group will be informed when, for example, a session is requested for a particular role. Additionally, there are other options like the one to set the number of approvals needed to approve a specific role.
The role configuration is described as follows:
- Filter system roles: If selected, predefined system roles for databases will not be displayed on the roles names list.
- Role Name: Used to select a role from the database for which we want to create a configuration.
- Notification Alias: Here, you indicate the notification entry created in the Admin tab, which will be used to notify users associated with it when an action concerns the specified role. After clicking on the envelope icon next to it, you can view the list of emails associated with the specified notification.
- Default Time: Specifies the default session duration for the given role.
- Maximum Time: Determines the maximum time for which a particular role can be requested.
- Approvals Needed: Used to specify the number of approvals required for the requested role to be approved. The default value is 1. You cannot set a value higher than the number of emails associated with the specified notification alias. If no emails are associated with the notification, the value is set to 0, and it cannot be changed.
- Special Roles: Check to allow for special postgres roles present in this role such as 'rolsuper' or 'rolcreaterole' should be inherited (postgres only).
Warning: This feature is NOT functional when using MySQL with version < 8.
Database Browser (beta)
Warning: Ensure that the configured data source user has superuser privileges. Without superuser privileges, the Database Browser may function incorrectly, leading to potential issues such as inaccurate information and the inability to update privileges.
The Database Browser is a versatile tool designed for viewing database structures and modifying permissions. It operates on multiple levels to provide detailed access and control. We can distinguish the following sections:
- Databases: The entry point where you select the database to work with. For SQLServer and Babelfish there are two catalogs provided with Server Users and Server Roles.
- Identities: Identities are categorized into catalogs. You can select a specific identity in the database to open the below sections. A distinction has been made between identities (identities that cannot log in), regular users (identities that can log in), superusers (identities that have super permissions) and groups (only for redshift, those are identities that cannot log in). When an identity is selected, it will be highlighted in red.
- Permissions: You can view the permissions of an identity, such as the ability to log in, superuser, etc.
- Schemas: Explore various schemas assigned to identities in the database.
- Tables: Access and manage tables, you can view and modify table and column permissions.
- Views: Access and manage views, a subset of tables, for tailored data representation. Similarly to tables, you can modify view and column permissions.
Depending on the type of database, the above description may vary slightly. For example, in MySQL there is no schema level, in SQLServer we have a distinction between database and server users. The above description represents the full functionality available for PostgreSQL databases, however, the idea remains the same.
By clicking the ✓ or ✖ near the permission and then click the Commit button you can save the updated permission.
By clicking the Refresh button you can fetch the newest data.
By clicking the Report button you can generate and download a report of the data source structure for the currently selected filters.
AWS Essentials
When operating within the AWS environment, users will encounter an enhanced configuration option: the AWS Secrets Manager checkbox located adjacent to the username field. Enabling this option reveals an additional field for the Secret Name. Upon inputting the AWS Secret Name and subsequently clicking the Commit button, the system will automatically retrieve credentials such as the username and password from AWS Secrets Manager.
These credentials are cached on both manager and proxy instances. This configuration supports automatic refreshing of these values in case a rotation happens. This is done only when the connection fails due to the cached credentials being out of date and authentication error happening on the data source.
Furthermore, in instances where a health check is configured and fails, such as during password rotation, the system will attempt to retrieve updated credentials from AWS Secrets Manager as well. In the event of a proxy-induced change to the username and password, users are advised to refresh the DataSource page to view the newly uploaded credentials.
For those utilizing a custom secrets manager, it is imperative to adhere to the structure outlined in AWS RDS secrets manager. For instance, the JSON format should resemble the following: {"username":"test","password":"test"}.
Configuring Trigger Based Invalidation
In order to support out-of-band data ingestion into the database, a data source can be configured for "Trigger based invalidation". There are several options available on the data source for this configuration:
- dbTriggerTracking: set to true if this feature is to be enabled
- dbTimeMs: A query to retrieve the current time in MS on the database, to insure timestamps are synchronized
- dbChanges: A query to retrieve a timestamp (as a long) and table name (as a string), for the last invalidation time of that table.
The dbTimeMs call for example defaults to the value of "SELECT heimdall.get_time_ms()", usable on MySQL without any changes. The dbChanges call defaults to {CALL heimdall.changes(@ts)}. This can be a stored procedure. The string "@ts" will be replaced automatically with the last modification (per the DB time) that has been observed, so can be used to pull only the tables that have been modified since the last invalidation observed. The table name returned needs to be fully qualified, as observed with the "printTables" option for the rules, as this is the table name we will be comparing for invalidation purposes.