Theme: Entering redundant data in order to increase the speed in SQL
Together with the development of e-commerce and with the launch of new devices which we can use to access the Internet, the loading speed of on-line shops’ pages has become critical. The loss incurred as a consequence of users leaving the site due to the slow loading speed of pages is difficult to measure.
And in the case of e-commerce websites, a few seconds of delay in information download is directly reflected in the decrease of the consumers’ number and hence in a smaller number of orders. For the applications which run on a database, the first efficiency criterion is the speed at which the data from the SQL server, either Oracle, SQL Server or MySql are loaded.
If for a product description page the process is quite simple, as all we have to do is to execute a query on the products table by the product ID and a few links, things become much more complicated when it comes to the dynamically created categories.
The dynamically created categories are a Cartesian product of categories and products. A category contains several products, a product belongs to several categories.
Take for example the following situation:
– Categories table – category ID and category related information
– Products table – product ID and product related information
– Tags table – Tag ID and related information
– Junction table mapping products and tags (many-to-many relationship)
– Junction table mapping categories and tags (many-to-many relationship)
Here is how the tables look like (simplified version):
Links between the tables:
Even in the simplest scenario, we need to query 5 tables in order to obtain a list of products associated to a category. Things become even more complicated when we add products sorting based on price, best sellers, customers rating or newest products.
For example, sorting by “best sellers” is done based on each category. For the “Photo customized gifts for men” category we will take into account the sales from the latest 90 days (for instance). However, for the “Valentine’s Day: Gifts for women” we have to refer to the sales before Valentine’s Day. Otherwise, this sort would not be relevant.
It is not difficult to deduce that links to the orders tables must be added to our query. If the categories and products are relatively few from a quantitative point of view (a few thousands for products, double-digits numbers for categories), orders and their details can go up to hundreds of thousands.
A solution that we used for a customer was to create a pre-computed table which contained the direct link between categories and products, plus other useful information: rank by price, sales, ratings, novelties. According to the relational databases theory, the information contained in that table is redundant (can be obtained from the existing tables).
The diagram thus becomes:
Hence, our 7-10 tables query becomes a simple query performed on two tables: products and categories_products:
select P.ID_Product, P.Description_Product
from Products P
inner join Categories_Products CP on P.ID_Product = CP.ID_Product
Where CP.ID_Category = 24011
order by CP.Sales_Rnk
With the execution speed on a production database, the advantage is obvious:
The only disadvantage is that when a product is added/tagged, it won’t be displayed on the related categories pages until we run the stored procedure for the update of categories_products table. This is only a minor inconvenience for those working with the website’s Back Office (they need an extra click for data update) compared to the speed improvement obtained when displaying the page for the customers.
The update of data in the pre-computed table can also be done automatically, with an SQL job which runs at a pre-set time.
When the data loading speed is critical, we can digress from the normalization of relational databases in order to obtain the expected result.
For sure there are other ways to reduce the execution time of queries.
In fact, we must choose the most efficient solution for its users, depending on the particularities and characteristics of the application that we develop.
There is no universal formula having the same outcomes for any type of application.