Dave Byrne, Author at VooServers - Page 2 of 2
Reliable hardware  -  Trained Staff

Troubleshooting Xen Virtual Machine Network Misuse and Over-Use on the Hyper-Visor

Horizontal White Line


You are here:  Support / Technical Blog

Posted on  - By

If you or your company provide virtual servers within a Xen Virtualisation Environment, then it’s probably safe to say that you’ve run into Network overuse or misuse in the past, on one or more of your Hyper-Visors. Troubleshooting this and finding the VM responsible can be a tricky one, as many control panels don’t report live virtual interface data. (And even if they did, you can’t connect to it during a large scale attack!).

We’ve compiled a few of the simplest, and most direct ways of pinpointing exactly which pesky VM is the cause. The only thing you need to have installed? Sysstat.

Network Misuse or Overuse (Inbound or Outbound Attacks)

If your network graphs alert you to network spikes, or suspicious activity, such as either bursting or sustained high PPS (packets per second) then you could have an attack on your hands. With budget VM’s being so cheap and attainable, and instant deployment pretty much the norm, it makes sense for malicious 3rd parties to use them as staging platforms to participate in traditional traffic based DDoS and other common reflection based attacks.

If the attack is large enough, you will struggle to connect to your Hyper-Visor over the network. So physical access may be required for this one.

The following command will give you a solid overview of the network use, per interface. This includes the virtual interfaces bound to your VM’s:

sar –n DEV 1 3

Explained:

This command uses sar. Sar is a handy tool that collates and displays various pieces of data from system activity counters, and can also be used to display in more useful ways, the contents of binary data files containing system performance history

  • -n – Reports the network statistics
  • DEV – Targets specifically the network devices
  • 1 – Interview between re-polling sar
  • 3 – Number of times to poll sar before averaging the results
Running the command should garner you something along the lines of this:

Troubleshooting Xen Virtual Machine Network

The above is largely normal, if you excuse the odd marginally high traffic level. The first 4 columns are what should be of interest to you, receive and transmit packets per second, and receive and transmit kB/s. If a VM is attacking, or being attacked, these values will usually all be in the 100,000’s. It will become hard to read the specific values, as the columns merge together.

The virtual interfaces are nicely named with the VM ID included. So this immediately tells you the unfortunate target or the unscrupulous attacker. However, you still don’t know the IP Address. And with the attacks ongoing, you still can’t log in to the friendly web GUI to suspend the VM.

The following command can help. There may also be times when you simply don’t want to shut the VM down, but you do want to stop the attacks at network level Lets assume you want the IP of vm1686.

find / -name vm1686.cfg -exec grep “vif” {} \;

Explained:

This uses a typical find command, but is combined with the –exec switch for added functionality.

  • / – Start search in the root
  • -name – Search by full file name
  • vmXXX.cfg – Substitute the VM ID into here
  • -exec grep “vif” {} /; – This executes a simple grep command on every result find, and places the filename of the found result after the grep parameter.
Tip: You could even go further with this and awk it to cut down on the un-needed information |awk ‘{ print $3 }’

The output of the above should give you something that looks like this:

Troubleshooting Xen Virtual Machine Network

From there, you can block/blackhole/nullroute the IP as you please, without having to shutdown the VM, and without ever needing to access your Hyper-Visors web GUI.





Posted on  - By

In the world of hosted virtualisation environments, disk misuse (or overuse) will be an all too common issue that you can face day to day. Often you may be left guessing as to which of the many VM’s on a Hypervisor is responsible.

We’ve compiled a few of the simplest, and most direct ways of pinpointing exactly which pesky VM is the cause. The only thing you need to have installed? Sysstat.


Physical/Underlying Disk Over-Utilisation:

If you monitor your disk IO levels (which you should) you may be alerted to certain disks having critically high levels of input/output utilisation. If this is the case, use the following.

iostat -d -x -k 5 3

Explained:
  • -d – Show the disk report (excludes the CPU report)
  • -x – Shows extended stats (the useful ones like %util and io queue size)
  • -k – Displays values in kB/s rather than in blocks/s (easier to understand the output)
  • -5 – Wait 5 seconds before re-polling iostat for new figures
  • -3 – Poll iostat 3 times, and then average the results
This will give you an output something like what’s shown below:

Troubleshooting Xen Virtual Machine Disk IO Over-Utilisation on the Hyper-Visor


As you can see, this node is running fine right now. In the far left column, you can see the device names, it lists the physical underlying disks as well as the virtual devices attached to the VM’s. The high IO culprit will be instantly visible, the disk causing the problems (often times in pairs, as each VM has an image and a swap), will usually have very close to 100% “&util”, and the various read and write columns will have tell-tale high numbers being reported. Use the above as a reference of a healthy, functioning, production level Hypervisor.

But wait, I hear you shouting… What good is a dm-x virtual device name? How can I resolve that to the VM name/number? Good question, see below for the next command to make use of.


lvdisplay|awk ‘/LV Name/{n=$3} /Block device/{d=$3; sub(“.*:”,”dm-“,d); print d,n;}’

Explained: I won’t dissect this fully, as it is a heavily awk’d lvdisplay. But in brief, it uses lvdispaly, which contains all of the information you need, but picks out the important information. It starts by pulling the 3rd value of the “LV Name” line, which is the VM’s logical volume name, this includes the VM ID which you can use to locate the VM later. It then pulls the block device number from the 3rd value of the “Block Device” line and appends that to a piece of text “dm-“ to make it a bit more readable.

The output will be something like this:

Troubleshooting Xen Virtual Machine Disk IO Over-Utilisation on the Hyper-Visor


It is now very easy to tie together the suspect dm-x device you found earlier, to a much more useful VM ID. If you want a quick fix, issue:

xm reboot vmXXX

This will gracefully reboot the VM if it is still responding.

xm console vmXXX

This will open the VM’s console, where you can either see what’s going on, or login and stop any processes you deem unruly

xm terminate vmXXX

This will force a shutdown on the VM.





Posted on  - By

The importance of server location..

With stronger links in network transport, and high speed travel more available, having servers deployed in multiple geographical locations is now easier than ever. With that in mind, you could ask why you might need a server somewhere other than where you are personally based?

The primary reason a server is located in a specific physical location is due to latency. Latency can be defined as the amount of time it takes for you in one location to send a request to the remote server, and receive a valid response back from it. A lower latency means it takes less time for that request to bounce back from the server, and can be translated very literally into how quickly a server can respond to requests and serve content. As you can probably tell, having as lower latency as possible is not only desirable, but in some cases it’s a requirement. Custom backend systems and VoIP systems for example, rely very heavily on low latency connections to avoid data-loss or voice-garble on data or digital voice communications. The main way to reduce the latency, is to lower the number of device hops between the user and the server. The less physical hops, the lower the latency should be. A shorter physical distance to where the server is located, means less hops. If your users are located in Germany for example, it may be best to position your server in a Germany-based datacentre, even though you as the administrator may very well be based in the US or the UK

Another key reason why geographical server location might be important is that of SEO, and localised results or biasing. If you are targeting a product or service at a certain geographical demographic, key search engines such as Google and Bing will add considerable weight to your listings in certain areas if the content being served is physically located in the same area (or at least the same country).


Our Datacentres..

We have datacentres in the US (New York), Germany (Frankfurt) and in the UK (Kent). These locations were chosen primarily for their strong global core network links. Overall speed and availability of interconnects to different transit providers are the main reasons for us creating our point of presence in these countries.

We perfectly placed to sculpt and provide extremely low latency routes for clients in most of mainland Europe, through the use of our Frankfurt presence, but also through the use of dedicated Layer 2 links between our UK and US facilities means we can offer extremely direct and efficient routes across the Atlantic Ocean. This lends itself very nicely to failover applications, something we pride ourselves on providing to clients.

With the technical talk out of the way, we also chose these locations as they are nicely spaced out geographically. We can provide services locally to many areas of the globe, and due to the physical distance between them all, even in the event of a nationwide network connectivity problem, only a small part of our network would ever be affected.

If you would like more information on geographical failover or other services in any of our facility locations, please Contact Us.





Posted on  - By

As an Internet or Email Service Provider, one of your biggest tripping points will be spam output levels coupled with spam filtering efficiency at your outboard relays. High spam levels will lead to poor individual IP or network block reputation, something that could make or break your service levels to customers.

Here at VooServers we’ve taken many steps to ensure we have the best possible overview of our email and spam throughput at all times, with measures put in place to provide advanced warning to any large scale spam outbreak

As an Email Service Provider or ISP it should be high on your priority list to gain a comprehensive overview of your clients outbound email behaviours. Email abuse and spam can come in all shapes and sizes from 100-200 emails a day from a dubious mailing list from a questionable Philippines domain, to multiple 100’s of thousands of emails sent overnight from malicious PHP mailer scripts on a hacked webserver.

It goes without saying that sufficient security levels and common sense when accepting orders goes a long way here. Screening certain countries and payment types for example will help to avoid the troublesome clients. And server security measures such as PHP function limitations and correct use of user directory jailing can be used to minimise the chances of succumbing to a web server exploit. But there are always small windows that unscrupulous spammers and scammers can use to gain enough a foothold to do some real damage. Your efficiency in being able to monitor the state of outbound mail relay queues is what will really help you in these situations.

Improving Image Spam


At VooServers, we have developed several small scripts/applications to be able to equip our technical team with enough of an edge to be able to respond quickly to spam outbreaks. This includes checks that run every 3 minutes on each of our webservers, checks that run every 3 minutes on our outbound mail relays and a custom built CLI application that can be executed on any Linux host running an Exim MTA to manage and view the mail queue without having to remember and correctly type in-depth Exim commands. Quick response and ease of use is key here, and anything that can increase the speed of your teams’ response

See our Checks and Application code in our post HERE





Posted on  - By



A simple and efficient way of grabbing the smart health of a scalable amount of disks on an LSI Hardware RAID Controller, exiting to Nagios statuses.



We are providing the following code to be used as an information resource only. It is licensed under GNU GPL. Any use of, or development of the resources provided is strictly unsupported and VooServers Ltd will not be liable for any loss of or damage to business or personal assets.




#!/bin/bash
#
#Dave Byrne
#LSI Hardware RAID S.M.A.R.T Check
#

#Where is storcli?
storclibin="/opt/MegaRAID/storcli/storcli64"

#Check if storcli actually exists, echo and exit warning if it isnt
if ! [ -e "$storclibin" ]
then
  echo "WARNING - StorCli Not Found"
  exit 1
fi

#Get number of underlying physical disks from storcli
underlyingdisks=`$storclibin /c0 show|grep "Physical Drives"|awk '{ print $4}'`

#Loop to grab the smart health result from all physical disks
counter=0
while [ $counter -lt $underlyingdisks ]; do
  #echo counter at $counter
  smartctl -d sat+megaraid,$counter -a /dev/sda|grep "SMART overall-health self-assessment test result" >> /usr/results.txt
  let counter=counter+1
done

#Analyse results and start giving exit codes

if grep -q FAILED "/usr/results.txt";
then
    #failed string found, time to work out what disk it was
    line=`awk '/FAILED/{ print NR; exit }' /usr/results.txt`
    #subtract 1 from line number as disks start at 0
    let line=line-1
    diskinfo=`smartctl -d sat+megaraid,$line -a /dev/sda|grep -E 'Device Model|Serial Number'`
    printf "CRITICAL - Device ID #$line FAILURE ImminentnDISK INFO: n$diskinfo n"
    rm -f /usr/results.txt
    exit 2
else
    #failed string not found, echo and exit ok
    echo "OK - All array disks healthy"
    rm -f /usr/results.txt
    exit 0
fi


This plugin is also published in Nagios Exchange HERE





Posted on  - By



We are providing the following code to be used as an information resource only. It is licensed under GNU GPL. Any use of, or development of the resources provided is strictly unsupported and VooServers Ltd will not be liable for any loss of or damage to business or personal assets.




Monitor Exim MTA’s on sending hosts (web servers etc):


The following bash script is used to monitor the physical size of the common queue in Exim, that is, inbound and outbound aggregated together. It will perform a simple check on the number of individual items in the queue, and if over a predetermined limit, will send an email to a monitored mailbox with information regarding the queue.

An Exim Mail Queue Summary will be sent, along with a breakdown of the top sending and receiving addresses categorized in both inbound out internal/external.

This allows staff to see if the queue is being saturated with inbound or outbound spam to and/or from internal or external email addresses, and in what sort of quantities. In most cases it is instantly obvious why the queue is growing in size.


#!/bin/bash
######### Edit here ##########

_mail_user=serverlogs@vooclients.com
_limit=100

##############################

clear;
_result="/tmp/eximqueue.txt"
_queue="`exim -bpc`"
bold=`tput bold`
normal=`tput sgr0`

if [ "$_queue" -ge "$_limit" ]; then
echo "============================================================================ " > $_result
echo "  Current queue is: $_queue (Limit is $_limit)" >> $_result
echo "============================================================================ " >> $_result
echo " " >> $_result
echo "Overall summary of Mail Queue:" >> $_result
echo "`exim -bp | exiqsumm`" >> $_result
echo " " >> $_result
echo " " >> $_result
echo "============================================================================ " >> $_result
echo "Top 10 Addresses being sent TO:" >> $_result
echo "(If address is internal, then mail is inbound. If address is external, then mail is outbound)" >> $_result
echo "============================================================================ " >> $_result
echo " " >> $_result
echo "`exim -bp | awk 'NR % 3 == 2 {print $1}' | sort | uniq -c | sort -rn | head -n10`" >> $_result
echo " " >> $_result
echo " " >> $_result
echo "============================================================================ " >> $_result
echo "Top 10 Addresses being sent FROM:" >> $_result
echo "(If address is internal, then mail is outbound. If address is external, then mail is inbound)" >> $_result
echo "============================================================================ " >> $_result
echo " " >> $_result
echo " " >> $_result
echo "`exim -bp | awk 'NR % 3 == 1 {print $4}' | sort | uniq -c | sort -rn | head -n10`" >> $_result
mail -s "Number of mails on `hostname` : $_queue (Limit is $_limit)" $_mail_user < $_result
cat $_result
fi

rm -f $_result
Monitor basic PostFix queue size (outboard relays):


The following bash script is rather simpler, all we need for this is an alert to tell us the PostFix Queue size on the outbound spam mail relay. The Relay has its own web interface where we can check in more detail what’s going on, and in most cases, the previously documented Exim MAT Monitor should have alerted you to what’s going on also.

The check monitors the “deferred” queue in PostFix, which is essentially everything that the relay couldn’t send in that immediate moment in time. If you have a spam outbreak, this queue will not be empty!

#!/bin/bash
######### Edit here ##########

_mail_user=serverlogs@vooclients.com
_limit=400

##############################

clear;
_result="/tmp/postfixqueue.txt"
_queue="`find /var/spool/postfix/deferred -type f | wc -l`"

if [ "$_queue" -ge "$_limit" ]; then
echo "Current number of mails in the outbound queue: $_queue (Threshold is set at '$_limit')" > $_result
#echo "Summary of Mail queue" >> $_result
#echo "`exim -bp | exiqsumm`" >> $_result
mail -s "ALERT! Number of Mails in Outbound Queue on ESVA1: $_queue" $_mail_user < $_result
#cat $_result
fi

echo "Current number of mails in the outbound queue: $_queue (Threshold is set at $_limit)" > $_result
cat $_result
rm -f $_result
Application for Easier Exim Queue Management on Sending Hosts:


Sometimes, your staff will not remember every command for every scenario, certainly not off the top of their head, and the time spent looking up the correct Exim command syntax is time where more spam mails are being relayed. We developed this rather simple, but also rather helpful command line application to be able to perform various actions on the Exim queue in question./span>

Features:
  • Remove multiple messages from queue by full address/partial domain
  • Remove ALL messages from queue (Careful now..)
  • Print a detailed queue listing
  • Print a quick queue summary
  • Show a simple mail queue item count
  • Print Exim’s current activity
  • Test how Exim will route to a given address
  • Display message headers, body or logs via Message ID and save these to a file
#!/bin/bash
########################
# VooServers Exim Tool #
#         v0.2         #
#     -Dave Byrne      #
########################
printf "\n########################\n# VooServers Exim Tool #\n########################\n\n"
printf "Select an option:\n\n1. Remove messages from queue by address/domain\n2. Remove ALL messages from queue\n3. Print detailed queue listing\n4. Print quick queue summary\n5. Show simple mail queue item count\n6. Print Exim's current activity\n7. Test how Exim will route to an address\n8. Display headers, body or logs via Message ID\n\nEnter 1-8: "
read menuchoice
printf "\n-----------\nYou chose $menuchoicen-----------\n"
case $menuchoice in
   1)
      printf "\n1. By sending address/domain\n2. By Receiving address/domain\n\nEnter 1-2:"
      read case1opt
      case $case1opt in
	1)
          echo "Enter full sending address or .*@domain.tld to remove from queue:"
          read rmargss
          exiqgrep -i -f $rmargss | xargs exim -Mrm
          exit 1
          ;;
        2)
          echo "Enter full receiving address or .*@domain.tld to remove from queue:"
          read rmargsr
          exiqgrep -i -r $rmargsr | xargs exim -Mrm
          exit 1
      esac
      ;;
   2)
      echo "Are you sure? Y/N"
      read conf1
      if [ "$conf1" = "y" ]; then
        exim -bp | exiqgrep -i | xargs exim -Mrm
        printf "\nEntire Mail Queue Destroyed"
      fi
      ;;
   3)
      exim -bp
      ;;
   4)
      exim -bp | exiqsumm
      ;;
   5)
      count=`exim -bpc`
      printf "\nThere are $count mails in the exim queue\n\n"
      ;;
   6)
      printf "\n"
      exiwhat
      printf "\n"
      ;;
   7)
      printf "\n"
      echo "Enter full address to show route: "
      read routeaddr
      printf "\n"
      exim -bt $routeaddr
      printf "\n"
      ;;
   8)
      printf "\n1. Headers\n2. Body\n3. Logs\n\nEnter 1-3: "
      read msgopt1
        case $msgopt1 in
          1)
            printf "Enter Message ID: "
            read msgid1
            exim -Mvh $msgid1
            exim -Mvh $msgid1 > /var/tmp/$msgid1-header
            printf "\n\nMessage Header also saved to file: /var/tmp/$msgid1-header\n\n"
            ;;
          2)
            printf "Enter Message ID: "
            read msgid1
            exim -Mvb $msgid1
            exim -Mvb $msgid1 > /var/tmp/$msgid1-body
            printf "\n\nMessage Body also saved to file: /var/tmp/$msgid1-body\n\n"
            ;;
          2)
            printf "Enter Message ID: "
            read msgid1
            exim -Mvl $msgid1
            exim -Mvl $msgid1 > /var/tmp/$msgid1-log
            printf "\n\nMessage Log also saved to file: /var/tmp/$msgid1-log\n\n"
      esac
      ;;
   *)
      printf "\n\nInvalid option. Try again...\n\n"
      sleep 2
      ./eximtool
esac
exit 1





Posted on  - By

Customer Relationship Management (CRM) software has become a busy market in recent years, with offerings spanning from freeware 3rd parties, all the way up to Enterprise level systems. Microsoft Dynamics CRM is one of the strongest products within this software sector to date, and we’re proud to be able to offer fully managed deployments for Microsoft Dynamics CRM 2011 & 2013.


Microsoft

We firmly believe that the automation features brought to the table by Dynamics CRM can help your company’s efficiency and work throughput. Since CRM 4.0 in 2007/08, Workflows, Dialogs and Reports have been a constant area of focused development, and are now more efficient than ever at helping you and your business out.

Behind the scenes, Workflows can quietly wait for a set event to occur, or a date to roll past before putting into action a vast array of commands designed to manipulate data from varying parts of your business on CRM. Dialogs can contain much the same commands but come into their own if user input is required to decide the outcome of certain commands

These actions can be as simple as needing to send out emails to members of your team when a record is resolved, or can be as involved as dynamically creating completely new records in separate entities based off of actions performed on the source record. For example, the big quote that your Sales Team have been working on gets accepted. In a matter of seconds, the relevant parts of the quote are dropped into place in a new deployment task for your Technical Team to plan and roll out. There’s no hand-typed emails between colleagues, prone to typing errors and often missing vital employees off of that CC: list, and no worry about having to give the Technical Team access to the Sales department to view the Quote details. Just fast, secure and efficient automation.

Using the same example, on successful service deployment, the Technical Team complete a pre-set up Dialog, confirming that the Clients service is live. Backend processes spring into action and automatically update the customers profile with the relevant access details, and send out a pre-written, customised Welcome Email. There’s no worrying about if your Technical Team have emailed the Client or not, no wasted time typing out Client Access Details which are prone to error.


Data Automation

Yet another way that Dynamics CRM can assist your Company is to offer solutions, or carry out tasks automatically to certain situations. A support desk, for example, may receive support requests for common tasks. With integration tools, we can have CRM action commands on servers and manage external processes, all automatically. Removing the element of human error in tasks that are undertaken on a regular occurrence.

To summarise. Using automation tools within Dynamics CRM 2011/2013 can massively help your company. Not only will it decrease the level of user input required at almost all stages, but it will also increase the accuracy of any data that is used in more than one location. In addition to this, automation can enable entire departments to become more proactive in scenarios rather than reactive, especially useful if situations are repeated on a day to day basis.

Contact us for more information, or a live demo.





Newer Posts


© VooServers Ltd 2016 - All Rights Reserved
Company No. 05598156