Essbase BSO Parallel Data Loads
As I continue down the path of my Essbase testing and benchmarking, I’m always looking for ways to make Essbase lay waste to hardware. As I was working on my new benchmarking application, I needed to load a lot of data into a BSO cube. I’m impatient and noticed that the data load was terribly inefficient at using the available hardware on my server. Essbase was using a single CPU thread to perform the load. So how can we make this load more intensive on the server and more importantly…faster? Essbase BSO Parallel Data Loads!
I know what you’re thinking, you can’t load data to a BSO in parallel. That only works in ASO, right? Wrong! Now, admittedly the ASO functionality for parallel loads is a lot more flexible, but starting in 22.214.171.124, BSO now has a basic way to perform parallel loads. Before we get to that, let’s take a look at the SQL load that was performed. The data set is roughly 10,000,000 rows. This is the basic MaxL code used:
import database EssBench.EssBench data connect as hypservice identified by 'mypasswordnotyours' using server rules_file 'dRev' on error write to "e:\\data\\EssBench\\dRev.txt";
Now let’s take a look at our resource usage:
Clearly we aren’t making good use of all of those CPU’s. And here’s the timing results:
At 333 seconds, that’s not bad. But can we do better? Let’s try this as a text file and see how it compares. I exported by data to text file and changed up my MaxL to this:
import database EssBench.EssBench data from data_file "e:\\data\\EssBench\\dRevCogsStats.ascii" using server rules_file 'dtRev' on error write to "e:\\data\\EssBench\\dtRev.err";
And let’s look at the resource usage:
That looks familiar. We are still wasting a lot of processing power. And how long did it take?
With a time of 331 seconds, we are looking at a virtual tie with the SQL-based rule. Now let’s see what happens when we break up the file into 16 parts (more on this another day). Here’s the MaxL:
import database EssBench.EssBench using max_threads 16 data from data_file "e:\\data\\EssBench\\dRevCogsStats*.txt" using server rules_file 'dtRev' on error write to "e:\\data\\EssBench\\dtRev.err";
We have 16 threads, let’s use them all! We have 16 files, so let’s see what happens:
That’s more like it! We still aren’t using all 16 threads fully, but at least we are using more than one! So how long did it take?
We are sitting at 219 seconds now. This is an improvement of roughly 34%. That’s a pretty nice improvement, but not nearly the improvement we would hope for given that we went from less than 10% CPU utilization to over 70% utilization. Why then did we not get a better improvement? That’s a question for another day.
In general, I found it interesting that the SQL-based load rule and the text-based load rule performed exactly the same. Obviously, the SQL-based load rule would be the faster of the two options given that we don’t have the overhead of first creating the text file. Next time, we’ll take a look at how to split a file using PowerShell.