Introduction Insert Bulk Bestia

Introduction

There are countless methods of copying data from one database to another, however, here we present a tool that performs this operation in a very flexible and efficient way. This application is also responsible for managing many of the operations that are normally performed manually during data copies. Thus, among others, the application is perfectly adapted to the following needs:

  • Projects on databases that are in continuous evolution. This implies that the pre-production (staging) or development databases require part of the existing data in production.
  • Full copy of the data in the database without the need to restoring a backup. The reasons for this action can be diverse. Insert Bulk Bestia will undoubtedly be the best option.
  • The current data protection law requires the concealment of certain data. Along with the possibilities that SQL Server offers, Insert Bulk Bestia will allow a custom concealment.
  • To make copies of data from an ONLINE production database. Insert Bulk Bestia can be configured to minimally affect production server performance, while maintaining data consistency after copying is complete.

SqlBulkCopy

Insert Bulk Bestia is based on the use of .NET SqlBulkCopy instructions in parallel in order to minimize the time spent copying data. The strategy used in this parallelization consists of starting with the heaviest tables, and in parallel making the copy of the lightest tables. This allows, in many cases, that the time that the copy lasts is reduced to the time spent in the copy of the table that takes up the most disk space.

In addition to the documentation provided by Microsoft, an in-depth reading of the article is recommended: Whitepaper: performance of SqlBulkCopy. This article will give you the knowledge you need to decide on the appropriate application configuration so that you get the most out of your hardware.

Indexes

During the different tests carried out with the application in several projects, it was observed that the best strategy to use consisted of: eliminating the indexes of the destination tables, copying the data and rebuilding the indexes in the destination.

However, on certain occasions, this strategy did not work correctly. It was found that when the cluster index of a given table is very large (it takes up a lot of disk space), the best strategy was not to eliminate said index. For this reason, in Insert Bulk Bestia a parameter was introduced to indicate from what index size, the cluster indexes are not eliminated. Said parameter must be established experimentally based on the characteristics of the available hardware and software.

Constraints and Foreign Keys

In order to increase data copy performance, Insert Bulk Bestia removes existing restrictions on target tables and disables triggers. The strategy implemented in Insert Bulk Bestia assumes that the data already meets the necessary conditions to be inserted into the tables, since a copy of data is being made from a source database to a destination database with "practically" identical characteristics.

The elimination of restrictions and triggers, especially the Foreing Keys, allows you to abstract from the business logic, and, for example, forget about the circular dependencies between tables. As a consequence, it is possible to make copies of databases that are ONLINE, since once the data has been copied, Insert Bulk Bestia is in charge of eliminating those records that prevent the lifting of a certain Foreing Key. It may happen that during the copy, the original database has been modified and certain records have not been copied.

Data Copy Parameterization

Insert Bulk Bestia performs a "default" parameterization of the data to be copied so that they are copied as is from one database to another. However, the application operator can perform a manual parameterization of the load to modify this behavior. Thus, it is allowed:

  • Table classification: Tables can be classified by groups to later indicate to Insert Bulk Bestia which groups of tables they want to load.
  • Table segmentation: Tables can be given a "WHERE" clause in such a way as to segment the table and thus allow a parallel load of the total table, or to make a copy of, for example, the last segment corresponding to the data most recent in the table.
  • Data transformation: Tables can be given a "SELECT" clause so that transformations can be made to the data, either to hide them or perform complex operations on them.
  • Data filtering: In addition to the segmentation noted above, the "WHERE" clause can be used for data filtering so that not all of the data is copied from source to destination.

Custom Actions

Insert Bulk Bestia Allows the application operator to schedule actions that are executed at certain times of the data copy:

  • Actions at the source: These actions are executed after the analysis of the tables in the source database. They should contain actions that automatically modify the selection of data from the source, or the performance of an automatic classification of tables. For example, these actions should automatically modify the SELECT or WHERE indicated in the previous section.
  • Actions at the destination:
    • Initial actions: These are actions that are executed immediately before the data copy starts. They can be actions designed at performing checks on the target database structure, or they can be designed at modifying the target fields where the data will be saved.
    • Final actions: These are actions that will be executed after the completion of the upload. They can be used to perform final transformations to the data, such as deleting emails or custom masking of data.

SQLCLR Assembiles

Insert Bulk Bestia creates the table "ibb.Log_Errores", into which all the information associated with the data copy is dumped, along with any errors that may occur. This table, and the associated stored procedures, can be very useful in your own development.

In addition, a series of .NET libraries are inserted into SQL Server with which the operation of the application is optimized, and which can also be useful in your own development:

  • Autonomous transactions (SQLServerAutonomousCalls): This .NET library allows inserting records in the "ibb.Log_Errores" table simulating Oracle autonomous transactions. In other words, the record inserted into the table will remain there even if a ROLLBACK statement is executed.
  • Treatment of character strings (SQLServerSafeFunctions): This library processes, in the most efficient way possible, character strings to convert them into tables. One of them creates a single field table from a character string and a separator. The other creates a table of two fields (key - value pairs) from a character string, a field separator, and a record separator.
  • Parallel execution of processes T-SQL (SQLServerAsyncExecProcedures): This library contains a series of stored procedures that allow the parallel execution of statements and / or stored procedures written in T-SQL. It is very useful in those processes in which the sequential execution of queries is not necessary, such as, for example, the reconstruction of indexes in parallel or the transfer of data in parallel through a "linked server", improving the poor performance of this method offers.

User Interface - Command Line

Insert Bulk Bestia contains a user interface with which the application operator can define the behavior of copying data from a source to a destination in a very "friendly" way. Each of these copies of data is called a "configuration" and is stored in a ".ini" file. From this user interface all the defined configurations can be managed.

However, in order to automate data loads, Insert Bulk Bestia can also operate from the command line, in such a way that it can be called from Control-M type process managers. From the command line you can indicate which "configurations" you want to execute. The application will return a 0 if the configurations have been executed correctly or a value other than 0 if any have failed.

Advantages

Undoubtedly, the great advantage of Insert Bulk Bestia is the customization of "what data" you want to copy from one database to another, and "what transformations" you want to perform on said data. It is not a backup strategy, but a "part" copy of the data from a source database to a destination one.

The tool operator is freed from having to define a data copy strategy according to the defined business logic, as the application will "unmount" the target database structure, copy the data, and remount said structure. This will make the destination database in almost "optimal" conditions to be used, since it will have totally new indexes, new statistics, and, depending on the parameterization, it could have minimal internal data fragmentation and indices.

All this is done with an efficient use of available hardware and software resources, which together with the possibility of scheduling the repetition of this type of tasks periodically, can lead to being an essential application in projects whose evolution is constant.

The assembiles that Insert Bulk Bestia inserts in SQL Server are sufficiently generic, efficient and useful that their own developments can benefit greatly from their use.

Finally, it should be noted that there are countless tools and applications that allow this process to be carried out, but none of them includes all the aspects that Insert Bulk Bestia includes.

Disadvantages

Unlike backups, this process is undoubtedly much more costly in terms of resource and time consumption. And without a doubt, it is more expensive to configure and "fine-tune" for proper operation. It is the cost associated with not distributing production backups to other environments, where data security can be jeopardized.