Engineering to Success: 2014

Sunday, July 27, 2014

IDOL To Solr Migration Lessons Learned - Part 5 Performance Tests Results

In this post, I will cover some lessons learned and key differences in performance of the two systems. Throughout the project, we’ve literally ran hundreds of tests to gauge stability and performance of both solutions under various conditions they would live in and the results were not always clear cut. Overall, it is hard to say which product performs better, but after some tweaking, we were able to get Solr to perform as good or better, in some cases, than IDOL; but we had to compromise on some functionality.

First, let’s loop back to the requirements and cover architectures of both solutions. We have two servers, each one holds 4 content engines and a DAH. The content engines on the second server are a mirror copy of the data on the first server. This was replicated in SolrCloud, we deployed 4 shards with two replicas for each one. We also had another DAH that pointed to the two lower DAHs for queries in a round-robin fashion, so basically, all queries got load balanced across a set of four content engines on either box. This provided good performance as well as high availability. One point to re-iterate was that we were using IDOL 7, which is a few years behind. With SolrCloud, we didn’t have DAHs or DIHs, queries were went from the client to any of the replicas and they sorted out the result set and returned it back to the client.

Now, let’s lay out our testing scenarios. We had a list of somewhere around 2,000 most frequent user queries that we used to hammer the systems. Our front end also supports multiple types of searches and each search can generate a few additional calls to Solr, all these were included in tests and although each call was separate, the times for execution were combined for related requests to get a bigger picture. Each test consisted of several load levels, most tests started at around 10,000 queries an hour and increased by 5,000 or 10,000 each 30 minutes. The maximum capacity we tested was approximately 60,000 queries per hour.

Since our performance goal was to do as good as IDOL, we started with defining IDOL performance by running a few tests. IDOL performance was stellar for each individual query, however because in IDOL facet requests are a separate call, we had to combine the time for both requests. This added some overhead, but combined they were still was less than half a second.

While IDOL tests were running, we were monitoring query performance as well as system utilization. The system utilization of RAM and CPU was essentially linear, each content engine was given about 10GB of RAM for caching and internal workings and it never surpassed that. The CPU utilization on these 128 core boxes was almost always below 25%, even at peak load.

As we pushed the system beyond our initial design requirements, we noticed a sudden increase in query lag. This was due to our specified thread count for each DAH, the threads were maxed out and as such searches had to wait a little while to be executed. Naturally, if this was a production environment we would deploy a few more DAHs to support increase in search, but since it wasn’t our testing objective and we were short on time, we decided to just call it a day. The main element here is that the IDOL content engines weren’t really breaking a sweat throughout most of these tests.

We repeated all of these tests with Solr, with necessary adjustment for internal workings on the system. However, it is worth to note that all business rules, fields, weights and etc… were preserved and replicated in Solr as close we as could match them.

As we begun to test Solr, with our limited experience in the system, we immediately started to run into problems. The initial and most critical problem was system stability; Solr kept crashing during our tests. The culprit ended up being Java Garbage Collection pauses which caused SolrCloud to time out from ZooKeeper. Our handy consultants were able to help us address this issue, however the documentation on this kind of problem is minimal at best. We would never have been able to track this as a root cause on our own.

If you are running into stability issues with Solr, Garbage Collection is definitely worth investigating. One of confirming symptoms is that there is some significant pauses in the log file with no activity whatsoever followed by a time out from zookeeper and a few funny error messages.

Once the stability issues were addressed we had to examine query performance. We were getting some very mixed results on some queries and it took us a little while to determine the exact cause. We were seeing generally good performance, better than IDOL, but on some queries the system would just stall out and take between 3 and 5 seconds to respond.

This is where we had to sacrifice some functionality. We have a unique identifier that is shared between several documents; when searching, we generally only want to see the data related to the unique identifier, however data from individual items is searchable as well. In both systems, this kind of grouping it is a piece of cake since we can easily group by a field value, however the culprit was getting facet counts based on this grouping. For some of the broader terms, the result sets returned were pretty large, somewhere around a million and Solr had to calculate facet counts based on groups which caused this delay. We had to get customer’s approval to turn off this functionality and were able to improve Solr’s performance at a slight loss in functionality.

We also measured system performance during Solr’s tests. I think it was worse than IDOL’s, the CPU utilization was never stable and resembled what I would imagine to be a crack head’s EKG chart. JVM heap was configured at 32GB as well, totaling less than IDOL. I wish we could give it some more RAM, but due to garbage collection problems we were advised against it.

In the end, we were able to shave off about ¼ of a second for almost all of our searches on average, which is a pretty tangible improvement in my opinion. However, we never went back to test IDOL’s performance without counting up facets based on groups, so I can’t say of certain which one is faster.

So here is my take on it. As far as performance goes, I think Solr is faster than IDOL 7, mainly because you can retrieve facets and results in the same query. However, as far as performance bang for the buck, I think IDOL does a much better job here. It’s hardware resource utilization is much more controlled and predictable.

Additionally, I my personal belief is that SolrCloud automates too much which turns into a lack of flexibility and control. For example, in IDOL, we had direct control over which content servers will be processing requests. While in SolrCloud, requests for each shard are distributed among replicas in an unpredictable manner. Don’t get me wrong, I am all in favor of automation, but the control freak in me is screaming “NOOOOOOOO!!!!”.

I know that there are a lot of elements missing here since query performance and system utilization is heavily dependent on data and types of queries and I didn’t share that information with you yet, but don’t worry, more info will be coming soon.

Tuesday, July 15, 2014

Getting to know your Solr users with LucidWorks Banana

I am taking a break from the IDOL to Solr Migration lessons learned because I wanted to share with you something kind of cool that I’ve been playing with. These days, I generally look at new technology mostly as more and more incompatible crap, however every once in a while I stumble on something with potential and get excited about it. In this case, it is a Kibana port called Banana from LucidWorks and a little python parser that gives me some better than usual insight into my users; data elements such as paging, sorting, faceting and filtering as well as specific terms lists.

What Kibana and Banana allow you to do is run report on the contents of your index by specifying fields you want to query and how you would like to display the data. For example, you can easily generate histogram of data with bars for every unique value of the field, or even a pie chart if there aren't many unique values. If you have time and event data, such as log files events, you can plot that data over time as well. So what’s the big deal here? Well the big deal is that you can have a number of these panels set up in your dashboard and that each element in the dashboard is actionable. Another big deal, is that these reports can be ran on almost any kind of data that you can feed to Solr.

For example, if you are plotting time series data, lets say for the past 90 days, and spot some interesting activity, you can simply select that range in the chart and a new filter will be automatically added to the report, at which point all data will refresh and only include your selected criteria. This goes for other elements, for example if you have a few elements in bar graphs, you can click on one of the bars and it will be added to the filter criteria. This allows you to dynamically filter and refine the report as needed without having to write SQL queries or look in various tables for data. I mean the whole thing is so easy even I could figure it out.

So this awesomeness combined with absence of any interesting plans for the three day 4th of July weekend inspired me to write a quick python script to parse Solr log files and index the data elements into Solr for Banana Analysis. Before long I was exploring our user’s activity on the site, the dashboard ended up looking something like this, several data elements are removed and pixilation added on purpose.

General log parsing can only provide some rudimentary analysis, query latency and number of queries can be pulled from so many sources and it is so trivial, it's not even worth writing about. What I wanted to see is what categories users searched in, what terms they used, how they sorted their results and if they used any other filtering criteria.

So in the python code I massaged the data a bit. For example, I broke out fq criteria into separate data elements; I would append each fq field from the query to a general fq doc field and create a new doc field for each field in fq, so if the user were filtering on a field called foo with a value of bar, fq would look like fq=foo:bar it would be indexed as:

fq: foo

fq_foo:bar

This allowed me to set up a terms panel for the fq field in Banana to see what fields users filtered on, then for interesting fields, I set up a separate terms panel on a specific fq field field (i.e. fq_foo from previous example) to examine the values used for filtering. Naturally, most of the values were standard since most of these are facets on the front end, however they directly mapped to clicks and it provided a great birds eye view of what categories users searched in.

Then I decided to get a bit more info on how the users sorted their searches in different categories. Now that is quite a bit of data and would be hard to visualize. Thankfully, banana came with a pretty slick heat map panel that allowed me to visualize just that. With the heat map, I can easily see what sorts were used in different categories. Keep in mind that this one uses facet pivot and if you have a collection with multiple shards, it isn’t going to work.

Now, let’s take it a step further and see what we can do about detecting paging for different searches. We have rows and start parameters in each query, rows tells you the number of documents to return and start is the offset from the beginning of search results. Most apps, generally return a result a fixed set of results per page, so it makes the data pretty neat. These can be plotted as terms bar plots in Banana terms.

This shows you the number of paged queries, why is that cool? Well, if users frequently have to navigate to the second and third page of your search results, it means they are not finding what they are looking for on the first one, so relevance may need to be tweaked.

So what else is cool? Well, I decided to tokenize the q argument on the whitespace (and remove other characters via python). This does a really cool thing because now, instead of just looking at the most frequent queries, I can look at individual terms. Furthermore, I can filter down on them, so let’s explore what happens here… I see a list of terms in the dashboard, I filter down on a term such as “paper” and as the dashboard refreshes, all the panels start showing me results only for items that contain paper in q, including the terms panel. At this point I can evaluate frequent category filters, sorts, other fq criteria and the number of queries over time for this one term. The terms panel also now allows me to look at other terms that were used in combination with paper, such as towel, toilet, copier, clip, etc… which is pretty cool. The actual queries are still available, there is a panel in Banana that will show you the raw data from Solr, so if necessary, you can drill down and see the un-tokenized version of the query.

Another cool thing the dashboard can be used for is detection of spidering activity. Generally, spidering occurs from multiple IP’s and unless they are sending malformed URL’s they are pretty hard to detect unless there is a noticeable spike in query load. Even when they are noticed, they usually take a bit of grepping to fully understand. However, they usually page quite a bit and do the same “kind” of searches.

Banana is great for this because you can very easily and quickly filter down on seemingly unknown data and look for patterns, spikes in volumes or weird queries or deep paging. Any weirdness or possible crawling activity can be quickly identified and visually confirmed. Huge plus.

In case you couldn’t tell, I am pretty excited about this thing. The really exciting part is that this can essentially be ran on any kind of data, not just logs. Just shove it in Solr and make some dashboards. The easy to use UI makes it rather approachable even from a standpoint of non-technical users. I am excited about future functionality as well, this can be a pretty powerful analysis tool.

If you wanted to set up something like this yourself, you can download Banana here:

https://github.com/LucidWorks/banana

My spaghetti code is available here:

https://github.com/wrdrvr/solrlogparserforbanana

Keep in mind it is a quick script and since there is a large variation of different log types that can be configured in Solr, it will may not work with your log files. I am working on making it more production ready and to be able to parse more types of Solr Logs. If you have any feedback, please let me know. If you’d like additional details on implementation let me know as well, I would be more than happy to write up a post with implementation instructions.