Tuesday, November 8, 2016

A new method to Integrate SoapUI test results in Yandex Allure Dashboard

Here is a quick but effective solution for embedding SoapUI into Yandex Allure. I know, SoapUI supports Maven, but the SmarBear guys are getting too restrictive in recent days. They are trying their best to enforce all users to go for Pro versions which is now all integrated into Ready!API package.

I tried to add Allure Junit listeners into SoapUI pom files, but wasn't proudly successful.  Then decided to make it work in my own way. This way works smoother and is not so sensitive to environment or version changes.

What we need:
  • Having a local or remote database server, compatible with jdbc
  • Add the connector jar to SoapUI ext folder. For Ready!API, I had to directly copy the mysql connector jar in /lib folder 
  • In your Java/TestNG code, call SoapUI project test case using cmd and testrunner. You might find that assertion errors cause the process runner to wait till times out. To prevent this, you should pull a trick out of your sleeve: "cmd /B start /WAIT cmd /C" . This makes the process to immediately stop if it fails with stderr messages. 
  • In addition, you need to assign an unique identifier to your testrunner process so you can extract the results using that identifier easily. Here is the function I wrote:

        @Step("Running SoapUI Process")
 public boolean processrun(String testRunnerPath, String suiteName, String testCaseName, String uuid, String project){
  boolean finished = false;
  try{
   String pString = "cmd /B start /WAIT cmd.exe /C \""+testRunnerPath+" -s"+suiteName+" -c"+testCaseName+" -r -Puuid="+uuid+" "+project+"\"";
    System.out.println(pString);
    
        String line;
        Process p = Runtime.getRuntime().exec(pString);
        BufferedReader input =  new BufferedReader(new InputStreamReader(p.getInputStream()));
        while ((line = input.readLine()) != null) {
         System.out.println(line);
         if (line.contains("TestCaseRunner Summary")){
          finished = true;
           Thread.sleep(3000);
           p.destroy();
           p.wait(1l);
           break;
         }
        }
        input.close();
        if (finished) return true; else  return false;
        
  }catch (Exception e){
   if (finished) return true;
   else {
   System.out.println("--------------->Error Occured!");
   saveTextLog("Unexpected-Error", e.getMessage());
   System.out.println(e.getMessage());
   Assert.fail("Unexpected Test Execution Error");
   return false;
   }
  }
 }


  • Please note that above function looks for "TestCaseRunner Summary" string, which is always generated when you use "-r" flag. 
  • The uuid is a unique identifier string. You can simply generate it like "uuid = java.util.UUID.randomUUID()"
  • Write some groovy code in TearDown pane and record the first assertion failure in your table including the uuid. Here is how to get the step name, status, assertion errors:


/*
 * Recording the results from each test case into MySQl
 * Author: ppirooznia
 */
import com.eviware.soapui.impl.wsdl.teststeps.*
boolean next = true
import groovy.sql.*;
import java.util.zip.GZIPOutputStream
def driver = 'com.mysql.jdbc.Driver'
def con = Sql.newInstance("jdbc:mysql://127.0.0.1:3306/test?autoReconnect=true&useSSL=false","username","passsword",driver);
def uuid = context.expand( '${#Project#uuid}' )
log.info( testRunner.testCase.testSuite.name)// + testCase.testSuite.project.name )
assertion = ""
message = ""
request = ""
response = ""
boolean comma = false
boolean getinfo = false
lquery = ""
testsuitename = testRunner.testCase.testSuite.name
testcasename = testRunner.testCase.name
//testRunner.testCase.testSteps.each{ name,props ->

testRunner.testCase.testSuite.project.testSuites[testsuitename].getTestCaseByName(testcasename).testSteps.each{ name,props ->
step = testRunner.testCase.testSuite.project.testSuites[testsuitename].getTestCaseByName(testcasename).getTestStepByName("$name")
if (step instanceof WsdlTestRequestStep || step instanceof RestTestRequestStep || step instanceof JdbcRequestTestStep || step instanceof HttpTestRequestStep)
{
     props.getAssertionList().each{
        log.info "$it.label - $it.status - $it.errors - $uuid"
  tname = "$name"
        if ("$it.status"=="FAILED") {
         getinfo = true
         if (comma){
          assertion += ", $it.label"
          message += ", $it.errors"
         } else {
          assertion += "$it.label" 
          message += "$it.errors"
          comma = true
         }
        }
        }

        if (next && getinfo) {
          rawResponse = context.expand( '${'+"$name"+'#RawResponse}' )
           zResponse = zip(rawResponse)
   request = context.expand( '${'+"$name"+'#RawRequest}' )
   zRequest = zip(request)
         assertion = assertion.replaceAll( /([^a-zA-Z0-9 _,:])/, '-' )
         message = message.replaceAll( /([^a-zA-Z0-9 _,:\[\]])/, '-' )
         if (assertion.length() > 128) {assertion = assertion.substr(0,128)}
         if (message.length() > 1024) {message = message.substr(0,1024)}
         lquery = "INSERT INTO test.soapuiresults (uuid,assertion,tstatus,message,step,request,response,tsuite,tcase) VALUES ('"+uuid+"','"+assertion+"','F','"+message+"','"+tname+"','"+zRequest+"','"+zResponse+"','"+testsuitename+"','"+testcasename+"');"
         try {
          if (next) {
           con.execute(lquery) 
           next = false
          }
         } catch (Exception e) {
          log.error e.message
         }
         comma = false
         getinfo = false
        }
}
}

 if (next) {
  lquery = 'INSERT INTO test.soapuiresults (uuid,assertion,tstatus,message,step,tsuite,tcase) VALUES ("'+uuid+'","No Errors","P","All Passed","All","'+testsuitename+'","'+testcasename+'");'
  try {
  con.execute(lquery) 
  } catch (Exception e) {
          log.error e.message
         }
 }
  con.close()  
def zip(String s){
def targetStream = new ByteArrayOutputStream()
def zipStream = new GZIPOutputStream(targetStream)
zipStream.write(s.getBytes())
zipStream.close()
def zipped = targetStream.toByteArray()
targetStream.close()
return zipped.encodeBase64()


  • Please note that in order to save HTML/XML requests and responses, I preferred to zip them and put them into database as a blob field. Later, I will use the unzipped text as a text attachment in Allure dashboard
  • If you have the Ready!API Pro version, you can add this code only once to the event handler (right-click on project node, select events, and create a "TestRunListener.afterRun" event)
  • now you need to retrieve data from database (using uuid selector) in your Allure Maven TestNG suite, and simply pass or fail them with regular Assert() method.

Thursday, July 21, 2016

A combination of SoapUI, Selenium WebDriver, PhantomJS and Groovy!

      Our company has a monitoring system for most of sensitive web services and websites, including local and intranet sites. Most of this monitoring is being done in few Linux centos boxes that execute SoapUI tesrunner.sh tool using NRPE-Nagios and return the final results into a dashboard.

Writing SoapUI projects and scripts for web services or even web GUIs with 1-2 pages are not hard, even in our case, that most of the page elements require extensive authentication process and passing security tokens in various ways. However, the job gets painful when a full user scenario should be created as a monitoring script. 

Lets give you an example: If the usecase is to open a login page, input the credentials and ensure all elements in the main page exist and are valid, it is possible to be done using SoapUI http methods by following network activity sequences. This activities can be sniffed using any browser "developer" tool, or SoapUI recorder itself.
Now, if the use case is more complicated, like filling up  a form after login, submit it, check the next page, submit the second form, wait for an email to come and reply it for confirmation, then ensure the subscription is finished successfully, that would be no easy way to do it using SoapUI steps and simple groovy coding.


The solution here is using a WebDriver in a Headless browser such as PhantomJS.

I managed to make it work after 2 days, and here are the lessons I learnt:

1- After unzipping the phantomjs tar file, make sure the /bin/phantomjs binary file is executable for all groups.
2-Do a "ldd phantomjs" and ensure all libraries are installed. Most probably you'll need to update your GLIBC and GLIBCXX (via libstdc++).
3- Make sure your phantomjs works. to do that create a file like loadepage.js :

var page = require('webpage').create();
page.open('http://www.google.com', function(status) {
  console.log("Status: " + status);
  if(status === "success") {
    page.render('example.png');
  }
  phantom.exit();
});

4- Then execute it like this "./phantomjs loadpage.js" . If you received a failure and you are sure that there is no network restriction to your destination, try this "./phantomjs --ignore-ssl-errors=true loadpage.js"

5- If your machine has jre 1.6 , then you have to use older Selenium jars (I recommend selenium-java-2.45.0 because it has most recent phantomservice jar)

6- Here is how I opened my groovy script, every piece of this code is vital and has its good reason:

import org.openqa.selenium.*
import org.openqa.selenium.firefox.FirefoxDriver
import org.openqa.selenium.phantomjs.PhantomJSDriver;
import org.openqa.selenium.phantomjs.PhantomJSDriverService;
import org.openqa.selenium.interactions.Actions
import org.openqa.selenium.remote.DesiredCapabilities
import org.openqa.selenium.support.ui.Select
import java.io.*
import java.util.concurrent.TimeUnit
import org.openqa.selenium.support.ui.ExpectedCondition
import org.apache.commons.io.FileUtils
import org.openqa.selenium.support.ui.WebDriverWait;
import org.openqa.selenium.support.ui.ExpectedConditions;
import java.util.Random
import java.util.logging.Level;
import java.util.logging.Logger;

def baseurl = context.expand( '${#TestCase#baseurl}' )
def username = context.expand( '${#TestCase#username}' )
def password = context.expand( '${#TestCase#password}' )
def outfolder = context.expand( '${#TestCase#outfolder}' )
def phantomfile = context.expand( '${#TestCase#phantomdriver}' )


dCaps = new DesiredCapabilities();
 dCaps.setJavascriptEnabled(true);
 dCaps.setCapability("elementScrollBehavior", true);
 dCaps.setCapability("phantomjs.page.settings.userAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36");
 ArrayList<String> cliArgsCap = new ArrayList<String>();
 cliArgsCap.add("--webdriver-loglevel=NONE");
 cliArgsCap.add("--ignore-ssl-errors=true");
 cliArgsCap.add("--web-security=false");
 cliArgsCap.add("--ssl-protocol=any");
 cliArgsCap.add("--webdriver-logfile=none");
 dCaps.setCapability(PhantomJSDriverService.PHANTOMJS_GHOSTDRIVER_CLI_ARGS,"--ignore-ssl-errors=yes");
 dCaps.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, cliArgsCap);
 dCaps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,phantomfile);
def WebDriver driver = new PhantomJSDriver(dCaps)
Logger.getLogger(PhantomJSDriverService.class.getName()).setLevel(Level.OFF);
Logger.getLogger(org.openqa.selenium.phantomjs.PhantomJSDriverService.class.getName()).setLevel(Level.OFF);
Dimension d = new Dimension(1280,1024);
driver.manage().window().setSize(d);
driver.manage().window().maximize();

driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);


7- Now go ahead and write your code as you usually do. Java and groovy are very similar when it comes to WebDriver. Just mae sure you use a bunch of try{..}catch{..}

8- Don't forget to log,assert, fail, and quit properly when you got to the catch{} part. The proper way would be in following order:

  1. logs
  2. quit the driver
  3. fail the test case
  4. assert

     log.error (failure_text_message)
    driver.quit()
    testRunner.fail(failure_text_message);
      assert "XX Page Elements Not found, please chaeck the logs"==failure_text_message
   
    return



Wednesday, June 10, 2015

How JMeter becomes more powerful when used with BlazeMeter

Backend-Based Functional Testing Using JMeter

Software testing and development has undergone a lot of evolution in recent times.
Functional testing was done locally and software used to run without any sort of external connectivity. Then came the nineties and software testing was started being done on internet-based applications like browsers. Fast forward a couple of decades and smartphones, tablets, the IoT and wearables have changed the way applications are used. Nowadays, functionality often lays within an application’s backend houses the functionality while the frontend is concerned mainly about how the data is presented within the boundaries of user experience. Naturally, this has also changed the way these modern applications are tested for their functionality.
Functional Testing Using JMeter
JMeter was initially built to provide an open source solution for performance and load testing. But JMeter can be used for performing backend-based functionality tests too. For instance: with JMeter, a part of your user registration functionality can be tested by testing the APIs of your system and by making sure that the users are created successfully in your database. You can utilize JMeter for creating an API call with various user names and passwords, instead of opening a browser and entering random data in the given fields manually.
The supported protocols by JMeter are the following:
·         Web: HTTPS web sites web 2.0 (flex-ws-amf, flex and ajar), web 1.0, HTTP
·         Web Services: XML-RPC or SOAP
·         Database through JDBC drivers
·         Directory: LDAP
·         JMS (Messaging Oriented service)
·         Service: SMTP, IMAP, POP3,
·         FTP Service
·         JDBC
·         Hadoop (and Kafka with Kafkameter Plugin)
·         UDP services
RESTful APIs normally make use of the POST, GET or PUT technique of HTTP requests, and create properties files with POST requests or JSON files which represent the data that is to be moved over. JMeter makes functional testing quite easy by utilizing a dedicated HTTPS or HTTP sampler. The data in a tested application can be used to create a request sample- and then an external file is utilized to provide the various parameters that you would want to test.
Assertions can also be used to configure functional testing in JMeter. The assertions identify if any API sent a reply, indicating whether a process failed or the desired data was received. For instance: during the testing of user registration, a reply would indicate if a user ID was created or not.
Java unit testing can also be performed using JMeter with Junit. JMeter has a native support for JUnit, and can be included in the whole functional testing strategy. Any Junit tests that were created while developing the software can also be included in the testing. JUnit tests can be utilized along with API calls.
Testing User Experience
The verification of the user experience and the backend functionality is very essential. JMeter cannot be used to test the user experience. Solutions like SauceLabs and Perfecto Mobile offer frontend testing tools for testing the user experience.
For verifying backend as well as frontend functionality, you can use SauceLabs to run Selenium-based tests on browsers and operating systems.
The Perfecto Mobile environment allows you to run functional tests on mobile devices supporting native apps as well as browser.
Full Functionality Testing of Your App
Your testing armory should be ideally equipped with a suite of tools to test the full functionality of your application. Both frontend and backend functionality tests should be carried out to cover all the bases, as both the tests complement each other.
JMeter is a true winner as it was developed originally for load testing as an open source tool, but it readily adapted to the changes in the functionality testing. The same samplers and plugins that were used to produce loads can be used by the developers for functional testing. This, along with the integration of various frontend tools, provides an end-to-end environment ensuring the functionality of your application.


Load testing is an important step when we are working with web applications and SAAS based applications. Then, there comes question of how to do load testing: setting up own lab or using cloud based testing service. Load Testing process involves setting up the test bed, writing automated scripts, continuous maintenance and monitoring the infrastructure which becomes very tedious. There are obviously multiple profits of using cloud for load testing. I am listing down a few benefits of cloud based services:
·         Cost benefits: You save huge money by not buying infrastructure
·         Time: It’s the responsibility of service provider to maintain the software so, your time is saved
·         Flexibility: You can run your tests from anywhere just an internet connection is needed
·         Meeting deadlines: As the infrastructure can’t have a breakdown, cloud-services don’t hinder your work schedules and you can stick to deadlines
·         Excellent service: The service benefits are much greater than your in-house IT department
·         Team efforts saved: Your much knowledgeable technical team doesn’t have to work hours to setup and maintain test setup and they can spend efforts mainly on testing and reports
·         Real Load benefits: The main feature of cloud based services is that you get realistic load depicting live scenarios
·         Continuous Customer Support: You can rely on this third party to help you anytime you have a question and you can continue your  performance testing as long as you need

When it comes to testing applications on cloud and making them scalable, load testing for 50K users becomes a challenge. Here comes the role of SAAS based load testing tool: BlazeMeter, which is highly scalable and can handle load of more than 300K users. BlazeMeter is actually “JMeter in the Cloud”.

Advantages of BlazeMeter:

JMeter compatibility
JMeter being open-source tool is always foremost choice for load testing in any project. It is the most common tool and recommended for its stability and performance. And Blazemeter provides 100% compatibility with JMeter scripts and also addresses limitations of JMeter. Older version of JMeter scripts can be reused with Blazemeter which saves huge efforts
Various Plugins
1. Blazemeter can also work with Chrome and can record browser actions & convert it to .jmx file.
2. BlazeMeter obtains the last 12 months of data from Google analytics and creates a test automatically for five mostly visited pages and based on this record it can set number of concurrent users too
3. WordPress users can test their App by using BlazeMeter plug-in without the need of scripting
Real Load availability
For an effective and meaningful when load comes from all type of sources. When emulators are used and virtual load is created from same IP, it sometimes provides unrealistic results. BlazeMeter can provide load from multiple IP’s which is vital for load testing the cloud applications.
Network Emulator: Blazemeter has option to customize various network types (3G, wifi, broadband etc.) and their bandwidth (download limit etc)
Flexibility of controlling agents
JMeter uses Master-Agent based architecture and Master controls multiple agents which generate load. Number of agents is a predefined parameter before running the test. But, with Blazemeter number of agents can be changed dynamically while running the tests and any instance can be used as Master or Agent while the test run
Live Server Monitoring for Throughput
JMeter requirement is to set the Target throughput parameter and application’s performance is compared to this threshold. But, Blazemeter provides the option to set runtime value for this parameter according to application’s performance. So, server can be monitored at various levels while running the test
Automatic Controlling of Agents
Load test strategy requires to determine parameters like ramp up time , number of concurrent users, test engines, test iterations and the test duration. In JMeter These values are configured before the test is started. The maintenance of EC2 instances has to be provisioned for the Agents, the master and slaves IP addresses has to be configured manually. Load Testers are expected to maintain, manage and monitor whole setup during the test cycles. But, it will be very tedious to maintain such setup for EC2 load where 50K + requests are needed. BlazeMeter automatically sets number of test engines, number of threads and engine capacity based on the number of concurrent users. And all this is customizable. The efforts of team are saved tremendously and they can concentrate on testing
CSV as per load test engine
Blazemeter allows different csv file per load test engine. In JMeter its all manual. Blazemeter keeps a common repository for all files which can be referred to each without manually copying
Scheduling and autotest
BlazeMeter as well as JMeter allow scheduling test start time and their duration which can be run anytime. Blazemeter allows weekly fixed schedules to be set too


With so many benefits, Blazemeter obviously has a bright future!!

Thursday, May 21, 2015

Testing Frameworks in 5 Minutes

What is Testing Framework?

Testing Framework is a set of principles like: coding standards, processes and practices, hierarchies, reporting methods, test data injections etc. which should be followed during automation testing in order to get beneficial results. Test automation can still be done without a framework but, if used, produces great advantages: Code reusability, ease of scripting, reduced script maintenance cost, understandability, less manual intervention, maximum coverage, high portability, easy reporting and recovery. So, a Test Automation Framework provides benefits which helps in development, execution and reporting of automation test scripts efficiently. A Framework also proves useful in a team so, that each member follows same approach for development. A framework is not dependent on type of application (Technology or architecture).
Popular Types of Frameworks:

Module Based Testing
Library Architecture Testing Framework
Data Driven Testing Framework
Keyword Driven Testing Framework
Hybrid Testing Framework
Behavior Driven Development Framework
Definition
Module based Testing Framework derives from OOPs concept of Abstraction. Under this whole application under test is broken down into number of modules. So, each module will have its separate test script. All smaller test scripts are integrated to build larger test script. The idea is to divide modules in such a way that application changes won't affect larger test scripts. So, scripts are easily maintainable
The common functions (based on common steps) from each script are segregated and a common library is created and functions can be called within scripts. E.g. Start and Stop a service in each test
Here Test data is segregated from script logic. Test data is stored externally in different files e.g. xml, csv etc or ODBC repositories. Test data is mapped to different sets of input and expected output values
Here Test data + common set of code is segregated into external file. The set of code is known as keyword. Keywords+Test data is stored in tabular form so, this framework is also called Table Driven Framework
It is a combination of more than 1 frameworks e.g Keywords and data used together in same sheet
Here functional validations are put in easily understandable format so, that anyone in the project hierarchy can understand it e.g. cucumber, Jbehave etc.
Advantages
Easier and cost effective maintenance of test scripts, scalable and can bear changes in application easily
Easier and cost effective maintenance of test scripts, reusability
It covers all combinations of test data, changes in data are easily accommodated, flexible and easy maintenance
Programming language knowledge is not needed here. Same keyword can be used multiple times
Advantages of all frameworks used
Programming language knowledge not needed
Disadvantages
Test data manipulations are required every time
Complicated and tedious to handle multiple sets of test data
Complex as data needs to be fed, requires knowledge of programming language
Complex and requires keyword creation knowledge






Test Automation Framework Design
It requires a well planned approach based on common, successfully implemented industry standards. A good Design should be:
  • Application-independent
  • Scalable and maintainable
  • Should separate the testers from the complexities
  • Identify and have common library for functions
  • Separated test data from the test scripts
  • Scripts should be able to run on their own with minimal errors
  • Version control
Here’s an overview of steps to Design a Framework
1.       Identify Testing Scope
2.       Types of Testing required in project
3.       Scope of automation identified
4.       Automation Tool Decided
5.       Data Input Store:
Object Mapping needs to be done and syntax decided for Object Identifiers e.g. Login button with an alias name
All the possible test scenarios identified and test cases decided based on sequence of flows e.g. TC1 Start service and login, TC2 performs transaction etc
List of custom messages decided for error handling
Driver-Files containing list of data files, transaction ids etc
File formats identified
6.       Develop Framework:
Scripting language decided based on tool
Utility functions created based on design
In addition Driver and Worker scripts are created
Approaches identified for reusability utilities e.g. data driven or keyword driven
7.       Populate Input data store using manual or automated methods
8.       Configuration of Schedulers, so that scripts can be initiated by anyone available


Saturday, May 9, 2015

Cloud Computing and Testing: A Simpler View


This write-up will give you a summary about the benefits, design and framework, programming and testing of clouds both as a service and as a structure too.

What is Cloud Computing?
Cloud Computing has made a great impact on IT industry. Data moved away from Personal computers and Enterprise application servers to be clustered on Cloud.
Cloud computing is a model which provides a convenient way to access and consume a shared resource pool which contains a wide variety of services: storage, networks, servers, applications etc. and that too on a demand basis. Additionally, the service provisioning and release is very easy to manage and doesn’t always require service provider’s intervention
For this, clouds use a large cluster of servers which provide a low-cost technology benefits to consumers by using specialized data connections for data processing. Virtualization is often used to multiply the potential of cloud computing.


It has three delivery models:

 








Infrastructure as a Service (IaaS)
Platform as a Service (PaaS)
Software as a Service (SaaS)

1. It’s the basic layer of cloud
2. Servers, networks, storage is provided by service provider
3. Software etc are cloud consumer's responsibility
1. No control by consumer over underlying infrastructure
2. A platform e.g. a web server or database or some content management tool like Wordpress is provided by Service provider which helps in application development
3. Here you will have a Virtual machine with all necessary software
1. Here whole application is outsourced to cloud provider
2. It will be provider's responsibility to manage license and access related issues
3. Examples are google docs or any hosted email services


Types of Clouds:
Public
Private
Hybrid
1. Here services are available to all
2. Service provider uses internet and his applications are widest group of users
1. Services (Equipment and data centres) are private to organization
2.  A secure access is given to users of organization
1. A mixture of both services
2. Some services of organization can be used by all and some are private to users inside

There are benefits of using Cloud Computing but, there are limitations too e.g. data integrity, will it be secure, will it stay private and also will services be available to all at all times.
Here comes the need of testing.

Types of Testing in Cloud Computing:

Testing a Cloud
Functional Testing

1. System Verification Testing: Functional needs are tested
2. Acceptance Testing: User testing is done for meeting requirements
3. Interoperability Testing: Application should function well anywhere even if transferred away from cloud
Non Functional Testing
1. Availability Testing: It is the responsibility of cloud vendor that the cloud is without sudden downtime and without affecting client's business
2. Security Testing: Making sure that there's no unauthorized access and that data integrity is maintained
3. Performance Testing: Stress and load testing to make sure that performance remains intact during situations of both maximum and decrease in load
4. Multi Tenancy Testing: Testing to make sure that services are available to multiple clients at same time and that data is secure to avoid access level conflicts
5. Disaster recovery Testing: Verification that the services are restored in case of failure with less disaster recovery time and with no harm to client's business
6. Scalability Testing: Verification that services can be scaled up or down as per needs
7. Interoperability Testing: It should be easy and possible to move a cloud application from one environment/platform to other

How does a Cloud store and process data?

Hadoop and MapReduce:
Earlier when data was manageable, it was stored in databases which had defined schema and relation. As data grew to Big data:Terabytes and Petabytes, (this data has unique characteristic than regular data : “write once read many (WORM)” ) ; Google Introduced GFS (Google File System) which was not open source. Google developed a new programming model called MapReduce. MapReduce is a software framework that allows programming to process stupendous amounts of unstructured data parallel across distributed cluster of processors. And Google Introduced BigTable: A distributed storage for managing structured data that allows scalability to large size: petabytes of data across thousands of commodity servers
Later, Hadoop Distributed File System (HDFS) was developed which is open source and distributed by Apache. Software framework used is MapReduce and the whole project is called Hadoop
MapReduce uses four entities:

Client
submits MR job
Jobtracker
helps in managing the job run. It is Java application whose main class is Jobtracker
Tasktracker
runs the tasks which are divided from job
Distributed File system
(commonly HDFS) which is used to share files among entities





Properties of HDFS:
Large
consists of thousands of server machines, each storing a fragment of system’s data
Replication
Each data job is replicated a number of times (default 3)
Failure
It is not taken as exception and is standard
Fault Tolerance
Detecting Faults and fast automatic recovery

Hadoop doesn’t waste time diagnosing the slow-running tasks instead it just detects when a task is slower and fires a replica of it as backup.



Apache HBase:
HBase is the Hadoop database. It is open source implementation of BigTable. For Real time and random access (read/write) needs to Big Data, HBase is used. It has very large tables hosting billions of rows*millions of columns. It is an open source, distributed storage structure for structured data. It is NoSQL database which stores data as key/value pairs in columns while HDFS uses flat files. So, it uses a combination of scalable abilities of Hadoop by running on the HDFS with real-time and random data access using key/value store and problem-solving properties of Map Reduce.
HBase uses four-dimensional data model and these 4 coordinates define each cell:

Row Key
Every row has unique key; the row key does not have a data type and is treated internally as a byte array.
Column Family
Data inside a row is organized into column families; each row having same set of column families, but across rows, the same column families don't require same column qualifiers. HBase stores column families in their own data files, which require definition upfront, and its hard to make changes to column families
Column Qualifier
Column families define columns, which are known as column qualifiers. Column qualifiers can be taken as the columns themselves
Version
Every column can have a configurable no of versions, and data can be accessed for a specific version of a column qualifier.

HBase allows 2 types of access: random access of rows through their row keys, column family, column qualifier, and version and offline or batch access through map-reduce queries. This dual-approach makes it very powerful.

 

QA Testing your MR jobs: which is actually testing the whole Cloud
Traditional unit testing framework e.g. JUnit, PyUnit etc. can be used to get started testing MR jobs. Unit tests are a great way for testing MR jobs at micro level. Although they don’t test MR jobs as whole inside Hadoop

MRUnit is a tool that can be used to unit-test map and reduce functions. MRUnit involves testing the same way as traditional unit tests so it’s simple and doesn’t require Hadoop to be running.There are some drawbacks of using MRUnit but, much more are the benefits.
MRUnit tests are simple. No external I/O files are needed and tests are faster. Illustration of a test class:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class DummyTest() {
  private Dummy.MyMapper mapper
  private Dummy.MyReducer reducer
  private MapReduceDriver driver

  @Before void setUp() {
    mapper = new Dummy.MyMapper()
    reducer = new Dummy.MyReducer()
    driver = new MapReduceDriver(mapper, reducer)
  }

  @Test void testMapReduce() {
    driver.withInput(new Text('key'), new Text('val'))
        .withOutput(new Text('foo'), new Text('bar'))
        .runTest()
  }
}
Map and Reduce can be tested separately and counters can be tested too.
During a job execution, Counters tell if a particular event occurred and how often. Hadoop has 4 types of counters:
File system, Job, Framework and Custom
Traditional unit tests and MRUnit help in detecting bugs early, but neither can test MR jobs within Hadoop. The local job runner let’s run Hadoop on a local machine, in one JVM, enabling MR jobs a little easier to debug in case of failing job.

Pseudo-distributed cluster constitutes of a single machine running all Hadoop giants. It tests integration with Hadoop better than the local job runner.

Running MR Jobs on a QA Cluster: Its most exhaustive but most complex and challenging mechanism of testing MR jobs on a QA cluster consisting at least a few machines


QA practices should be chosen based on organizational needs and budget. Unit-tests/MRUnit/local job runner can test MR jobs extensively in a simple way. But, running jobs on a QA or development cluster is obviously the best way to fully test MR jobs.

I hope that this blog will tell you that study of cloud is as vast as a cloud itself.