I think scalability is where true engineering
takes place. Each vendor will have scalability documentation on how much load
their product can support. These are usually conservative amounts so that they
don’t promise too much. These need to be examined very carefully, since they
will almost never provide you ALL the information you need, but are just a part
of the puzzle.
Imagine that you are deploying a system for
around 10,000 users. Out of these, there are perhaps 2000 concurrent users
doing various tasks. The system will already have some data in it and new data
will be coming in hourly. So you look at the vendor scalability documentation
and it provides a case scenario of a 2000 user system, with 100 concurrent
users doing one or two common tasks and data volume much lower than yours. Therefore,
multiplying the environment in the scalability documentation would not provide
you with the performance you are looking for.
This is mainly because, depending on the user
actions, the bottle neck may occur in different system areas. Here is where the
knowledge of backend technologies plays a crucial role. You will need to speed
up the bottle neck areas and the only way to figure out how is through theory
and experience.
For example, let’s say that during operating
hours, a back end database like SQL is hard-hit and is causing a bottle neck.
Instead of the common shotgun approach of upgrading hardware and moving to
faster storage whip out performance monitor and start logging metrics. Windows Performance
Monitor is a hugely underrated tool. It can provide valuable insight into many
system performance issues. However, in this scenario, it would not be enough to
determine which hardware component is causing the bottle neck. You would need
to monitor specific SQL performance metrics to determine why the hardware
component is causing the bottle neck, and then address the issue.
Going on with the example, you can run PerfMon
to determine that hard drives are causing the bottle neck; it’s pretty common,
since storage is usually the slowest part of the system. Drilling deeper, you can
determine that the issue is due to constant reads from disks, at which point
you can dig even deeper to see what the cache hit ratio is. If it is low, you
can enhance the cache hit ratio to increase performance of SQL without
necessarily upgrading hardware. You may also notice that some storage mediums are
being heavily read, while another storage array is underutilized. In that case,
moving the database or a few heavily read tables to that medium would also
enhance performance.
Remember, various storage architectures are
better for reading or for writing. It’s not uncommon to have different arrays
for the same database server for different databases or even tables in the
database. Impact of other components should be considered as well.
Furthermore, we can seek optimization outside
of the bottle neck component. For example, let’s say that front end web servers
are being heavily taxed. If the sessions are encrypted, it could be because of
the large overhead for SSL and authentication. In this scenario, you can look
into offloading SSL authentication at a different front end component such as
an application delivery controller.
Don’t underestimate simple tools for purposes
of scalability. A simple performance monitor and Microsoft Excel with the
analytics plug-in is all you need in most cases.
Creatively scaling solutions is a dying
talent because most of the time, people think it’s easier to just throw more
hardware at the issue.
No comments:
Post a Comment