Metrics Used In Testing
The Product Quality Measures - 1. Customer satisfaction index, 2. Delivered defect quantities, 3. Responsiveness (turnaround time) to users, 4. Product volatility, 5. Defect ratios, 6. Defect removal efficiency, 7. Complexity of delivered product, 8. Test coverage, 9. Cost of defects, 10. Costs of quality activities, 11. Re-work, 12. Reliability and Metrics for Evaluating Application System Testing.
The Product Quality Measures:
1. Customer satisfaction index
This index is surveyed before product delivery and after product delivery
(and on-going on a periodic basis, using standard questionnaires).The following are analyzed:
Number of system enhancement requests per year
Number of maintenance fix requests per year
User friendliness: call volume to customer service hotline
User friendliness: training time per new user
Number of product recalls or fix releases (software vendors)
Number of production re-runs (in-house information systems groups)
2. Delivered defect quantities
They are normalized per function point (or per LOC) at product delivery (first 3 months or first year of operation) or Ongoing (per year of operation) by level of severity, by category or cause, e.g.: requirements defect, design defect, code defect, documentation/on-line help defect, defect introduced by fixes, etc.
3. Responsiveness (turnaround time) to users
Turnaround time for defect fixes, by level of severity
Time for minor vs. major enhancements; actual vs. planned elapsed time
4. Product volatility
Ratio of maintenance fixes (to repair the system & bring it into compliance with specifications), vs. enhancement requests (requests by users to enhance or change functionality)
5. Defect ratios
Defects found after product delivery per function point.
Defects found after product delivery per LOC
Pre-delivery defects: annual post-delivery defects
Defects per function point of the system modifications
6. Defect removal efficiency
Number of post-release defects (found by clients in field operation), categorized by level of severity
Ratio of defects found internally prior to release (via inspections and testing), as a percentage of all defects
All defects include defects found internally plus externally (by customers) in the first year after product delivery
7. Complexity of delivered product
McCabe's cyclomatic complexity counts across the system
Halstead’s measure
Card's design complexity measures
Predicted defects and maintenance costs, based on complexity measures
8. Test coverage
Breadth of functional coverage
Percentage of paths, branches or conditions that were actually tested
Percentage by criticality level: perceived level of risk of paths
The ratio of the number of detected faults to the number of predicted faults.
9. Cost of defects
Business losses per defect that occurs during operation
Business interruption costs; costs of work-arounds
Lost sales and lost goodwill
Litigation costs resulting from defects
Annual maintenance cost (per function point)
Annual operating cost (per function point)
Measurable damage to your boss's career
10. Costs of quality activities
Costs of reviews, inspections and preventive measures
Costs of test planning and preparation
Costs of test execution, defect tracking, version and change control
Costs of diagnostics, debugging and fixing
Costs of tools and tool support
Costs of test case library maintenance
Costs of testing & QA education associated with the product
Costs of monitoring and oversight by the QA organization (if separate from the development and test organizations)
11. Re-work
Re-work effort (hours, as a percentage of the original coding hours)
Re-worked LOC (source lines of code, as a percentage of the total delivered LOC)
Re-worked software components (as a percentage of the total delivered components)
12. Reliability
Availability (percentage of time a system is available, versus the time the system is needed to be available)
Mean time between failure (MTBF).
Man time to repair (MTTR)
Reliability ratio (MTBF / MTTR)
Number of product recalls or fix releases
Number of production re-runs as a ratio of production runs
Metrics for Evaluating Application System Testing:
Metric = Formula
Test Coverage = Number of units (KLOC/FP) tested / total size of the system. (LOC represents Lines of Code)
Number of tests per unit size = Number of test cases per KLOC/FP (LOC represents Lines of Code).
Acceptance criteria tested = Acceptance criteria tested / total acceptance criteria
Defects per size = Defects detected / system size
Test cost (in %) = Cost of testing / total cost *100
Cost to locate defect = Cost of testing / the number of defects located
Achieving Budget = Actual cost of testing / Budgeted cost of testing
Defects detected in testing = Defects detected in testing / total system defects
Defects detected in production = Defects detected in production/system size
Quality of Testing = No of defects found during Testing/(No of defects found during testing + No of acceptance defects found after delivery) *100
Effectiveness of testing to business = Loss due to problems / total resources processed by the system.
System complaints = Number of third party complaints / number of transactions processed
Scale of Ten = Assessment of testing by giving rating in scale of 1 to 10
Source Code Analysis = Number of source code statements changed / total number of tests.
Effort Productivity = Test Planning Productivity = No of Test cases designed / Actual Effort for Design and Documentation
Test Execution Productivity = No of Test cycles executed / Actual Effort for testing
Monday, October 12, 2009
Bug Identification
Bug Reporting
This article highlights the essence and traits of finding bugs. It leaps to redefine the art, a tester should inculcate while finding a bug. It enumerates various artifacts in reporting a bug. Whereas, also voices the advocacy on the bugs that has been reported. The basic amenity of a tester being to fight for the bug until it is fixed.
Introduction:
As testers, we all agree to the fact that the basic aim of the Tester is to decipher bugs. Whenever a build appears for testing, the primary objective is to find out as many bugs as possible from every corner of the application. To accomplish this task as perfection, we perform testing from various perspectives. We strain the application before us through various kinds of strainers like boundary value analysis, validation checks, verification checks, GUI, interoperability, integration tests, functional – business concepts checking, backend testing (like using SQL commands into db or injections), security tests, and many more. This makes us to drill deep into the application as well as the business.
We would agree to the fact that Bug Awareness is of no use until it is well documented. Here comes the role of BUG REPORTS. The bug reports are our primary work product. This is what people outside the testing group notices. These reports play an important role in the Software Development Life Cycle – in various phases as they are referenced by testers, developers, managers, top shots and not to forget the clients who these days demand for the test reports. So, the Bug Reports are remembered the most.
Once the bugs are reported by the testers and submitted to the developers to work upon, we often see some kinds of confrontations – there are humiliations which testers face sometimes, there are cold wars – nonetheless the discussions take the shape of mini quarrels – but at times testers and developers still say the same thing or they are correct but the depiction of their understanding are different and that makes all the differences. In such a situation, we come to a stand-apart that the best tester is not the one who finds most of the bugs or the one who embarrasses most programmers but is the one who gets most of the bugs fixed.
Bug Reporting
The first aim of the Bug Report is to let the programmer see the failure. The Bug Report gives the detailed descriptions so that the programmers can make the Bug fail for them. In case, the Bug Report does not accomplish this mission, there can be back flows from the development team saying – not a bug, cannot reproduce and many other reasons.
Hence it is important that the BUG REPORT be prepared by the testers with utmost proficiency and specificity. It should basically describe the famous 3 What's, well described as:
What we did:
Module, Page/Window – names that we navigate to
Test data entered and selected
Buttons and the order of clicking
What we saw:
GUI Flaws
Missing or No Validations
Error messages
Incorrect Navigations
What we expected to see:
GUI Flaw: give screenshots with highlight
Incorrect message – give correct language, message
Validations – give correct validations
Error messages – justify with screenshots
Navigations – mention the actual pages
Pointers to effective reporting can be well derived from above three What's. These are:
1. BUG DESCRIPTION should be clearly identifiable – a bug description is a short statement that briefly describes what exactly a problem is. Might be a problem required 5-6 steps to be produced, but this statement should clearly identify what exactly a problem is. Problem might be a server error. But description should be clear saying Server Error occurs while saving a new record in the Add Contact window.
2. Bug should be reported after building a proper context – PRE-CONDITIONS for reproducing the bug should be defined so as to reach the exact point where bug can be reproduced. For example: If a server error appears while editing a record in the contacts list, then it should be well defined as a pre-condition to create a new contact and save successfully. Double click this created contact from the contacts list to open the contact details – make changes and hit save button.
3. STEPS should be clear with short and meaningful sentences – nobody would wish to study the entire paragraph of long complex words and sentences. Make your report step wise by numbering 1,2,3…Make each sentence small and clear. Only write those findings or observations which are necessary for this respective bug. Writing facts that are already known or something which does not help in reproducing a bug makes the report unnecessarily complex and lengthy.
4. Cite examples wherever necessary – combination of values, test data: Most of the times it happens that the bug can be reproduced only with a specific set of data or values. Hence, instead of writing ambiguous statement like enter an invalid phone number and hit save…one should mention the data/value entered….like enter the phone number as 012aaa@$%.- and save.
5. Give references to specifications – If any bug arises that is a contradictive to the SRS or any functional document of the project for that matter then it is always proactive to mention the section, page number for reference. For example: Refer page 14 of SRS section 2-14.
6. Report without passing any kind of judgment in the bug descriptions – the bug report should not be judgmental in any case as this leads to controversy and gives an impression of bossy. Remember, a tester should always be polite so as to keep his bug up and meaningful. Being judgmental makes developers think as though testers know more than them and as a result gives birth to a psychological adversity. To avoid this, we can use the word suggestion – and discuss with the developers or team lead about this. We can also refer to some application or some module or some page in the same application to strengthen our point.
7. Assign severity and priority – SEVERITY is the state or quality of being severe. Severity tells us HOW BAD the BUG is. It defines the importance of BUG from FUNCTIONALITY point of view and implies adherence to rigorous standards or high principles. Severity levels can be defined as follows:
Urgent/Show – stopper: Like system crash or error message forcing to close the window, System stops working totally or partially. A major area of the users system is affected by the incident and It is significant to business processes.
Medium/Workaround: When a problem is required in the specs but tester can go on with testing. It affects a more isolated piece of functionality. It occurs only at one or two customers or is intermittent.
Low: Failures that are unlikely to occur in normal use. Problems do not impact use of the product in any substantive way. Have no or very low impact to business processes
State exact error messages.
PRIORITY means something Deserves Prior Attention. It represents the importance of a bug from Customer point of view. Voices precedence established by urgency and it is associated with scheduling a bug Priority Levels can be defined as follows:
High: This has a major impact on the customer. This must be fixed immediately.
Medium: This has a major impact on the customer. The problem should be fixed before release of the current version in development or a patch must be issued if possible.
Low: This has a minor impact on the customer. The flaw should be fixed if there is time, but it can be deferred until the next release.
8. Provide Screenshots – This is the best approach. For any error say object references, server error, GUI issues, message prompts and any other errors that we can see – should always be saved as a screenshot and be attached to the bug for the proof. It helps the developers understand the issue more specifically
This article highlights the essence and traits of finding bugs. It leaps to redefine the art, a tester should inculcate while finding a bug. It enumerates various artifacts in reporting a bug. Whereas, also voices the advocacy on the bugs that has been reported. The basic amenity of a tester being to fight for the bug until it is fixed.
Introduction:
As testers, we all agree to the fact that the basic aim of the Tester is to decipher bugs. Whenever a build appears for testing, the primary objective is to find out as many bugs as possible from every corner of the application. To accomplish this task as perfection, we perform testing from various perspectives. We strain the application before us through various kinds of strainers like boundary value analysis, validation checks, verification checks, GUI, interoperability, integration tests, functional – business concepts checking, backend testing (like using SQL commands into db or injections), security tests, and many more. This makes us to drill deep into the application as well as the business.
We would agree to the fact that Bug Awareness is of no use until it is well documented. Here comes the role of BUG REPORTS. The bug reports are our primary work product. This is what people outside the testing group notices. These reports play an important role in the Software Development Life Cycle – in various phases as they are referenced by testers, developers, managers, top shots and not to forget the clients who these days demand for the test reports. So, the Bug Reports are remembered the most.
Once the bugs are reported by the testers and submitted to the developers to work upon, we often see some kinds of confrontations – there are humiliations which testers face sometimes, there are cold wars – nonetheless the discussions take the shape of mini quarrels – but at times testers and developers still say the same thing or they are correct but the depiction of their understanding are different and that makes all the differences. In such a situation, we come to a stand-apart that the best tester is not the one who finds most of the bugs or the one who embarrasses most programmers but is the one who gets most of the bugs fixed.
Bug Reporting
The first aim of the Bug Report is to let the programmer see the failure. The Bug Report gives the detailed descriptions so that the programmers can make the Bug fail for them. In case, the Bug Report does not accomplish this mission, there can be back flows from the development team saying – not a bug, cannot reproduce and many other reasons.
Hence it is important that the BUG REPORT be prepared by the testers with utmost proficiency and specificity. It should basically describe the famous 3 What's, well described as:
What we did:
Module, Page/Window – names that we navigate to
Test data entered and selected
Buttons and the order of clicking
What we saw:
GUI Flaws
Missing or No Validations
Error messages
Incorrect Navigations
What we expected to see:
GUI Flaw: give screenshots with highlight
Incorrect message – give correct language, message
Validations – give correct validations
Error messages – justify with screenshots
Navigations – mention the actual pages
Pointers to effective reporting can be well derived from above three What's. These are:
1. BUG DESCRIPTION should be clearly identifiable – a bug description is a short statement that briefly describes what exactly a problem is. Might be a problem required 5-6 steps to be produced, but this statement should clearly identify what exactly a problem is. Problem might be a server error. But description should be clear saying Server Error occurs while saving a new record in the Add Contact window.
2. Bug should be reported after building a proper context – PRE-CONDITIONS for reproducing the bug should be defined so as to reach the exact point where bug can be reproduced. For example: If a server error appears while editing a record in the contacts list, then it should be well defined as a pre-condition to create a new contact and save successfully. Double click this created contact from the contacts list to open the contact details – make changes and hit save button.
3. STEPS should be clear with short and meaningful sentences – nobody would wish to study the entire paragraph of long complex words and sentences. Make your report step wise by numbering 1,2,3…Make each sentence small and clear. Only write those findings or observations which are necessary for this respective bug. Writing facts that are already known or something which does not help in reproducing a bug makes the report unnecessarily complex and lengthy.
4. Cite examples wherever necessary – combination of values, test data: Most of the times it happens that the bug can be reproduced only with a specific set of data or values. Hence, instead of writing ambiguous statement like enter an invalid phone number and hit save…one should mention the data/value entered….like enter the phone number as 012aaa@$%.- and save.
5. Give references to specifications – If any bug arises that is a contradictive to the SRS or any functional document of the project for that matter then it is always proactive to mention the section, page number for reference. For example: Refer page 14 of SRS section 2-14.
6. Report without passing any kind of judgment in the bug descriptions – the bug report should not be judgmental in any case as this leads to controversy and gives an impression of bossy. Remember, a tester should always be polite so as to keep his bug up and meaningful. Being judgmental makes developers think as though testers know more than them and as a result gives birth to a psychological adversity. To avoid this, we can use the word suggestion – and discuss with the developers or team lead about this. We can also refer to some application or some module or some page in the same application to strengthen our point.
7. Assign severity and priority – SEVERITY is the state or quality of being severe. Severity tells us HOW BAD the BUG is. It defines the importance of BUG from FUNCTIONALITY point of view and implies adherence to rigorous standards or high principles. Severity levels can be defined as follows:
Urgent/Show – stopper: Like system crash or error message forcing to close the window, System stops working totally or partially. A major area of the users system is affected by the incident and It is significant to business processes.
Medium/Workaround: When a problem is required in the specs but tester can go on with testing. It affects a more isolated piece of functionality. It occurs only at one or two customers or is intermittent.
Low: Failures that are unlikely to occur in normal use. Problems do not impact use of the product in any substantive way. Have no or very low impact to business processes
State exact error messages.
PRIORITY means something Deserves Prior Attention. It represents the importance of a bug from Customer point of view. Voices precedence established by urgency and it is associated with scheduling a bug Priority Levels can be defined as follows:
High: This has a major impact on the customer. This must be fixed immediately.
Medium: This has a major impact on the customer. The problem should be fixed before release of the current version in development or a patch must be issued if possible.
Low: This has a minor impact on the customer. The flaw should be fixed if there is time, but it can be deferred until the next release.
8. Provide Screenshots – This is the best approach. For any error say object references, server error, GUI issues, message prompts and any other errors that we can see – should always be saved as a screenshot and be attached to the bug for the proof. It helps the developers understand the issue more specifically
Wednesday, October 7, 2009
MAIN OBJECTIVES OF TESTING
Objectives of Testing
Finding of Errors - Primary Goal
Trying to prove that software does not work. Thus, indirectly verifying that software meets requirements
Software Testing
Software testing is the process of testing the functionality and correctness of software by running it. Software testing is usually performed for one of two reasons:
(1) defect detection
(2) reliability or Process of executing a computer program and comparing the actual behavior with the expected behavior
What is the goal of Software Testing?
* Demonstrate That Faults Are Not Present
* Find Errors
* Ensure That All The Functionality Is Implemented
* Ensure The Customer Will Be Able To Get His Work Done
Modes of Testing
* Static Static Analysis doesn¡¦t involve actual program execution. The code is examined, it is tested without being executed Ex: - Reviews
* Dynamic In Dynamic, The code is executed. Ex:- Unit testing
Testing methods
* White box testing Use the control structure of the procedural design to derive test cases.
* Black box testing Derive sets of input conditions that will fully exercise the functional requirements for a program.
* Integration Assembling parts of a system
Verification and Validation
* Verification: Are we doing the job right? The set of activities that ensure that software correctly implements a specific function. (i.e. The process of determining whether or not products of a given phase of the software development cycle fulfill the requirements established during previous phase). Ex: - Technical reviews, quality & configuration audits, performance monitoring, simulation, feasibility study, documentation review, database review, algorithm analysis etc
* Validation: Are we doing the right job? The set of activities that ensure that the software that has been built is traceable to customer requirements.(An attempt to find errors by executing the program in a real environment ). Ex: - Unit testing, system testing and installation testing etc
What's a 'test case'?
A test case is a document that describes an input, action, or event and an expected response, to determine if a feature of an application is working correctly. A test case should contain particulars such as test case identifier, test case name, objective, test conditions/setup, input data requirements, steps, and expected results
What is a software error ?
A mismatch between the program and its specification is an error in the program if and only if the specifications exists and is correct.
Risk Driven Testing
What if there isn't enough time for thorough testing?
Use risk analysis to determine where testing should be focused. Since it's rarely possible to test every possible aspect of an application, every possible combination of events, every dependency, or everything that could go wrong, risk analysis is appropriate to most software development projects. This requires judgement skills, common sense, and experience.
Considerations can include:
- Which functionality is most important to the project's intended purpose?
- Which functionality is most visible to the user?
- Which aspects of the application are most important to the customer?
- Which parts of the code are most complex, and thus most subject to errors?
- What do the developers think are the highest-risk aspects of the application?
- What kinds of tests could easily cover multiple functionality?
Whenever there's too much to do and not enough time to do it, we have to prioritize so that at least the most important things get done. So prioritization has received a lot of attention. The approach is called Risk Driven Testing. Here's how you do it: Take the pieces of your system, whatever you use - modules, functions, section of the requirements - and rate each piece on two variables, Impact and Likelihood.
Finding of Errors - Primary Goal
Trying to prove that software does not work. Thus, indirectly verifying that software meets requirements
Software Testing
Software testing is the process of testing the functionality and correctness of software by running it. Software testing is usually performed for one of two reasons:
(1) defect detection
(2) reliability or Process of executing a computer program and comparing the actual behavior with the expected behavior
What is the goal of Software Testing?
* Demonstrate That Faults Are Not Present
* Find Errors
* Ensure That All The Functionality Is Implemented
* Ensure The Customer Will Be Able To Get His Work Done
Modes of Testing
* Static Static Analysis doesn¡¦t involve actual program execution. The code is examined, it is tested without being executed Ex: - Reviews
* Dynamic In Dynamic, The code is executed. Ex:- Unit testing
Testing methods
* White box testing Use the control structure of the procedural design to derive test cases.
* Black box testing Derive sets of input conditions that will fully exercise the functional requirements for a program.
* Integration Assembling parts of a system
Verification and Validation
* Verification: Are we doing the job right? The set of activities that ensure that software correctly implements a specific function. (i.e. The process of determining whether or not products of a given phase of the software development cycle fulfill the requirements established during previous phase). Ex: - Technical reviews, quality & configuration audits, performance monitoring, simulation, feasibility study, documentation review, database review, algorithm analysis etc
* Validation: Are we doing the right job? The set of activities that ensure that the software that has been built is traceable to customer requirements.(An attempt to find errors by executing the program in a real environment ). Ex: - Unit testing, system testing and installation testing etc
What's a 'test case'?
A test case is a document that describes an input, action, or event and an expected response, to determine if a feature of an application is working correctly. A test case should contain particulars such as test case identifier, test case name, objective, test conditions/setup, input data requirements, steps, and expected results
What is a software error ?
A mismatch between the program and its specification is an error in the program if and only if the specifications exists and is correct.
Risk Driven Testing
What if there isn't enough time for thorough testing?
Use risk analysis to determine where testing should be focused. Since it's rarely possible to test every possible aspect of an application, every possible combination of events, every dependency, or everything that could go wrong, risk analysis is appropriate to most software development projects. This requires judgement skills, common sense, and experience.
Considerations can include:
- Which functionality is most important to the project's intended purpose?
- Which functionality is most visible to the user?
- Which aspects of the application are most important to the customer?
- Which parts of the code are most complex, and thus most subject to errors?
- What do the developers think are the highest-risk aspects of the application?
- What kinds of tests could easily cover multiple functionality?
Whenever there's too much to do and not enough time to do it, we have to prioritize so that at least the most important things get done. So prioritization has received a lot of attention. The approach is called Risk Driven Testing. Here's how you do it: Take the pieces of your system, whatever you use - modules, functions, section of the requirements - and rate each piece on two variables, Impact and Likelihood.
Monday, October 5, 2009
PERFORMANCE vs LOAD vs STRESS TESTING
Performance Vs. Load Vs. Stress Testing
Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms interchangeably, but they have in fact quite different meanings. This post is a quick review of these concepts, based on my own experience, but also using definitions from testing literature -- in particular: "Testing computer software" by Kaner et al, "Software testing techniques" by Loveland et al, and "Testing applications on the Web" by Nguyen et al.
Performance testing
The goal of performance testing is not to find bugs, but to eliminate bottlenecks and establish a baseline for future regression testing. To conduct performance testing is to engage in a carefully controlled process of measurement and analysis. Ideally, the software under test is already stable enough so that this process can proceed smoothly.
A clearly defined set of expectations is essential for meaningful performance testing. If you don't know where you want to go in terms of the performance of the system, then it matters little which direction you take (remember Alice and the Cheshire Cat?). For example, for a Web application, you need to know at least two things:
expected load in terms of concurrent users or HTTP connections
acceptable response time
Once you know where you want to be, you can start on your way there by constantly increasing the load on the system while looking for bottlenecks. To take again the example of a Web application, these bottlenecks can exist at multiple levels, and to pinpoint them you can use a variety of tools:
.at the application level, developers can use profilers to spot inefficiencies in their code (for example poor search algorithms)
.at the database level, developers and DBAs can use database-specific profilers and query optimizers
at the operating system level, system engineers can use utilities such as top, vmstat, iostat (on Unix-type systems) and PerfMon (on Windows) to monitor hardware resources such as CPU, memory, swap, disk I/O; specialized kernel monitoring software can also be used
.at the network level, network engineers can use packet sniffers such as tcpdump, network protocol analyzers such as ethereal, and various utilities such as netstat, MRTG, ntop, mii-tool
From a testing point of view, the activities described above all take a white-box approach, where the system is inspected and monitored "from the inside out" and from a variety of angles. Measurements are taken and analyzed, and as a result, tuning is done.
However, testers also take a black-box approach in running the load tests against the system under test. For a Web application, testers will use tools that simulate concurrent users/HTTP connections and measure response times. Some lightweight open source tools I've used in the past for this purpose are ab, siege, httperf. A more heavyweight tool I haven't used yet is OpenSTA. I also haven't used The Grinder yet, but it is high on my TODO list.
When the results of the load test indicate that performance of the system does not meet its expected goals, it is time for tuning, starting with the application and the database. You want to make sure your code runs as efficiently as possible and your database is optimized on a given OS/hardware configurations. TDD practitioners will find very useful in this context a framework such as Mike Clark's jUnitPerf, which enhances existing unit test code with load test and timed test functionality. Once a particular function or method has been profiled and tuned, developers can then wrap its unit tests in jUnitPerf and ensure that it meets performance requirements of load and timing.
If, after tuning the application and the database, the system still doesn't meet its expected goals in terms of performance, a wide array of tuning procedures is available at the all the levels discussed before. Here are some examples of things you can do to enhance the performance of a Web application outside of the application code per se:
Use Web cache mechanisms, such as the one provided by Squid
Publish highly-requested Web pages statically, so that they don't hit the database
Scale the Web server farm horizontally via load balancing
Scale the database servers horizontally and split them into read/write servers and read-only servers, then load balance the read-only servers
Scale the Web and database servers vertically, by adding more hardware resources (CPU, RAM, disks)
Increase the available network bandwidth
Performance tuning can sometimes be more art than science, due to the sheer complexity of the systems involved in a modern Web application. Care must be taken to modify one variable at a time and redo the measurements, otherwise multiple changes can have subtle interactions that are hard to qualify and repeat.
In a standard test environment such as a test lab, it will not always be possible to replicate the production server configuration. In such cases, a staging environment is used which is a subset of the production environment. The expected performance of the system needs to be scaled down accordingly.
The cycle "run load test->measure performance->tune system" is repeated until the system under test achieves the expected levels of performance. At this point, testers have a baseline for how the system behaves under normal conditions. This baseline can then be used in regression tests to gauge how well a new version of the software performs.
Another common goal of performance testing is to establish benchmark numbers for the system under test. There are many industry-standard benchmarks such as the ones published by TPC, and many hardware/software vendors will fine-tune their systems in such ways as to obtain a high ranking in the TCP top-tens. It is common knowledge that one needs to be wary of any performance claims that do not include a detailed specification of all the hardware and software configurations that were used in that particular test.
Load testing
We have already seen load testing as part of the process of performance testing and tuning. In that context, it meant constantly increasing the load on the system via automated tools. For a Web application, the load is defined in terms of concurrent users or HTTP connections.
In the testing literature, the term "load testing" is usually defined as the process of exercising the system under test by feeding it the largest tasks it can operate with. Load testing is sometimes called volume testing, or longevity/endurance testing.
Examples of volume
testing a word processor by editing a very large document
testing a printer by sending it a very large job
testing a mail server with thousands of users mailboxes
a specific case of volume testing is zero-volume testing, where the system is fed empty tasks
Examples of longevity/endurance testing:
testing a client-server application by running the client in a loop against the server over an extended period of time
Goals of load testing:
expose bugs that do not surface in cursory testing, such as memory management bugs, memory leaks, buffer overflows, etc.
ensure that the application meets the performance baseline established during performance testing. This is done by running regression tests against the application at a specified maximum load.
Although performance testing and load testing can seem similar, their goals are different. On one hand, performance testing uses load testing techniques and tools for measurement and benchmarking purposes and uses various load levels. On the other hand, load testing operates at a predefined load level, usually the highest load that the system can accept while still functioning properly. Note that load testing does not aim to break the system by overwhelming it, but instead tries to keep the system constantly humming like a well-oiled machine.
In the context of load testing, I want to emphasize the extreme importance of having large datasets available for testing. In my experience, many important bugs simply do not surface unless you deal with very large entities such thousands of users in repositories such as LDAP/NIS/Active Directory, thousands of mail server mailboxes, multi-gigabyte tables in databases, deep file/directory hierarchies on file systems, etc. Testers obviously need automated tools to generate these large data sets, but fortunately any good scripting language worth its salt will do the job.
Stress testing
Stress testing tries to break the system under test by overwhelming its resources or by taking resources away from it (in which case it is sometimes called negative testing). The main purpose behind this madness is to make sure that the system fails and recovers gracefully -- this quality is known as recoverability.
Where performance testing demands a controlled environment and repeatable measurements, stress testing joyfully induces chaos and unpredictability. To take again the example of a Web application, here are some ways in which stress can be applied to the system:
double the baseline number for concurrent users/HTTP connections
randomly shut down and restart ports on the network switches/routers that connect the servers (via SNMP commands for example)
take the database offline, then restart it
rebuild a RAID array while the system is running
run processes that consume resources (CPU, memory, disk, network) on the Web and database servers
I'm sure devious testers can enhance this list with their favorite ways of breaking systems. However, stress testing does not break the system purely for the pleasure of breaking it, but instead it allows testers to observe how the system reacts to failure. Does it save its state or does it crash suddenly? Does it just hang and freeze or does it fail gracefully? On restart, is it able to recover from the last good state? Does it print out meaningful error messages to the user, or does it merely display incomprehensible hex codes? Is the security of the system compromised because of unexpected failures? And the list goes on.
Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms interchangeably, but they have in fact quite different meanings. This post is a quick review of these concepts, based on my own experience, but also using definitions from testing literature -- in particular: "Testing computer software" by Kaner et al, "Software testing techniques" by Loveland et al, and "Testing applications on the Web" by Nguyen et al.
Performance testing
The goal of performance testing is not to find bugs, but to eliminate bottlenecks and establish a baseline for future regression testing. To conduct performance testing is to engage in a carefully controlled process of measurement and analysis. Ideally, the software under test is already stable enough so that this process can proceed smoothly.
A clearly defined set of expectations is essential for meaningful performance testing. If you don't know where you want to go in terms of the performance of the system, then it matters little which direction you take (remember Alice and the Cheshire Cat?). For example, for a Web application, you need to know at least two things:
expected load in terms of concurrent users or HTTP connections
acceptable response time
Once you know where you want to be, you can start on your way there by constantly increasing the load on the system while looking for bottlenecks. To take again the example of a Web application, these bottlenecks can exist at multiple levels, and to pinpoint them you can use a variety of tools:
.at the application level, developers can use profilers to spot inefficiencies in their code (for example poor search algorithms)
.at the database level, developers and DBAs can use database-specific profilers and query optimizers
at the operating system level, system engineers can use utilities such as top, vmstat, iostat (on Unix-type systems) and PerfMon (on Windows) to monitor hardware resources such as CPU, memory, swap, disk I/O; specialized kernel monitoring software can also be used
.at the network level, network engineers can use packet sniffers such as tcpdump, network protocol analyzers such as ethereal, and various utilities such as netstat, MRTG, ntop, mii-tool
From a testing point of view, the activities described above all take a white-box approach, where the system is inspected and monitored "from the inside out" and from a variety of angles. Measurements are taken and analyzed, and as a result, tuning is done.
However, testers also take a black-box approach in running the load tests against the system under test. For a Web application, testers will use tools that simulate concurrent users/HTTP connections and measure response times. Some lightweight open source tools I've used in the past for this purpose are ab, siege, httperf. A more heavyweight tool I haven't used yet is OpenSTA. I also haven't used The Grinder yet, but it is high on my TODO list.
When the results of the load test indicate that performance of the system does not meet its expected goals, it is time for tuning, starting with the application and the database. You want to make sure your code runs as efficiently as possible and your database is optimized on a given OS/hardware configurations. TDD practitioners will find very useful in this context a framework such as Mike Clark's jUnitPerf, which enhances existing unit test code with load test and timed test functionality. Once a particular function or method has been profiled and tuned, developers can then wrap its unit tests in jUnitPerf and ensure that it meets performance requirements of load and timing.
If, after tuning the application and the database, the system still doesn't meet its expected goals in terms of performance, a wide array of tuning procedures is available at the all the levels discussed before. Here are some examples of things you can do to enhance the performance of a Web application outside of the application code per se:
Use Web cache mechanisms, such as the one provided by Squid
Publish highly-requested Web pages statically, so that they don't hit the database
Scale the Web server farm horizontally via load balancing
Scale the database servers horizontally and split them into read/write servers and read-only servers, then load balance the read-only servers
Scale the Web and database servers vertically, by adding more hardware resources (CPU, RAM, disks)
Increase the available network bandwidth
Performance tuning can sometimes be more art than science, due to the sheer complexity of the systems involved in a modern Web application. Care must be taken to modify one variable at a time and redo the measurements, otherwise multiple changes can have subtle interactions that are hard to qualify and repeat.
In a standard test environment such as a test lab, it will not always be possible to replicate the production server configuration. In such cases, a staging environment is used which is a subset of the production environment. The expected performance of the system needs to be scaled down accordingly.
The cycle "run load test->measure performance->tune system" is repeated until the system under test achieves the expected levels of performance. At this point, testers have a baseline for how the system behaves under normal conditions. This baseline can then be used in regression tests to gauge how well a new version of the software performs.
Another common goal of performance testing is to establish benchmark numbers for the system under test. There are many industry-standard benchmarks such as the ones published by TPC, and many hardware/software vendors will fine-tune their systems in such ways as to obtain a high ranking in the TCP top-tens. It is common knowledge that one needs to be wary of any performance claims that do not include a detailed specification of all the hardware and software configurations that were used in that particular test.
Load testing
We have already seen load testing as part of the process of performance testing and tuning. In that context, it meant constantly increasing the load on the system via automated tools. For a Web application, the load is defined in terms of concurrent users or HTTP connections.
In the testing literature, the term "load testing" is usually defined as the process of exercising the system under test by feeding it the largest tasks it can operate with. Load testing is sometimes called volume testing, or longevity/endurance testing.
Examples of volume
testing a word processor by editing a very large document
testing a printer by sending it a very large job
testing a mail server with thousands of users mailboxes
a specific case of volume testing is zero-volume testing, where the system is fed empty tasks
Examples of longevity/endurance testing:
testing a client-server application by running the client in a loop against the server over an extended period of time
Goals of load testing:
expose bugs that do not surface in cursory testing, such as memory management bugs, memory leaks, buffer overflows, etc.
ensure that the application meets the performance baseline established during performance testing. This is done by running regression tests against the application at a specified maximum load.
Although performance testing and load testing can seem similar, their goals are different. On one hand, performance testing uses load testing techniques and tools for measurement and benchmarking purposes and uses various load levels. On the other hand, load testing operates at a predefined load level, usually the highest load that the system can accept while still functioning properly. Note that load testing does not aim to break the system by overwhelming it, but instead tries to keep the system constantly humming like a well-oiled machine.
In the context of load testing, I want to emphasize the extreme importance of having large datasets available for testing. In my experience, many important bugs simply do not surface unless you deal with very large entities such thousands of users in repositories such as LDAP/NIS/Active Directory, thousands of mail server mailboxes, multi-gigabyte tables in databases, deep file/directory hierarchies on file systems, etc. Testers obviously need automated tools to generate these large data sets, but fortunately any good scripting language worth its salt will do the job.
Stress testing
Stress testing tries to break the system under test by overwhelming its resources or by taking resources away from it (in which case it is sometimes called negative testing). The main purpose behind this madness is to make sure that the system fails and recovers gracefully -- this quality is known as recoverability.
Where performance testing demands a controlled environment and repeatable measurements, stress testing joyfully induces chaos and unpredictability. To take again the example of a Web application, here are some ways in which stress can be applied to the system:
double the baseline number for concurrent users/HTTP connections
randomly shut down and restart ports on the network switches/routers that connect the servers (via SNMP commands for example)
take the database offline, then restart it
rebuild a RAID array while the system is running
run processes that consume resources (CPU, memory, disk, network) on the Web and database servers
I'm sure devious testers can enhance this list with their favorite ways of breaking systems. However, stress testing does not break the system purely for the pleasure of breaking it, but instead it allows testers to observe how the system reacts to failure. Does it save its state or does it crash suddenly? Does it just hang and freeze or does it fail gracefully? On restart, is it able to recover from the last good state? Does it print out meaningful error messages to the user, or does it merely display incomprehensible hex codes? Is the security of the system compromised because of unexpected failures? And the list goes on.
Subscribe to:
Posts (Atom)