macronet

Nov 202014
 

Warning: You need to recreate your cluster if you’re already running and want to change to SSL.

  • Create certificates
openssl req -new -x509 -days 3650 -nodes -keyout galera.key -out galera.crt
  • Copy them over and configure into use
# grep wsrep_provider_options /etc/mysql/conf.d/galera.cnf
wsrep_provider_options="socket.ssl_cert=/etc/mysql/cert/galera.crt;socket.ssl_key=/etc/mysql/cert/galera.key"
  • Shut down the cluster and bootstrap it
Nov 202014
 

Quick reminder how to recompile Bind9 with MySQL SDB:

  • Prepare build environment
apt-get install build-essential fakeroot dpkg-dev devscripts
cd /usr/src/
apt-get build-dep bind9
  • Get source
apt-get source bind9/wheezy
  • Copy SDB files into place

mysql-bind$: cp mysqldb.h ../bind9-9.8.4.dfsg.P1/bin/named/include/
mysql-bind$: cp mysqldb.c ../bind9-9.8.4.dfsg.P1/bin/named/

  • Configure (read instructions from the web-page), quick diffs below
bind9-9.8.4.dfsg.P1/bin/named/main.c:
...
#include <dlz/dlz_dlopen_driver.h>
+#include <named/mysqldb.h>
...
+ mysqldb_init();
+
ns_server_create(ns_g_mctx, &ns_g_server);
...
ns_server_destroy(&ns_g_server);

+ mysqldb_clear();
+
ns_builtin_deinit();
...
bind9-9.8.4.dfsg.P1/bin/named/Makefile.in:
...
-DBDRIVER_OBJS =
-DBDRIVER_SRCS =
-DBDRIVER_INCLUDES =
-DBDRIVER_LIBS =
+DBDRIVER_OBJS = mysqldb.@O@
+DBDRIVER_SRCS = mysqldb.c
+DBDRIVER_INCLUDES = -I/usr/include/mysql -fno-omit-frame-pointer -g -pipe -Wno-uninitialized -g -static-libgcc -fno-omit-frame-pointer -fno-strict-aliasing
+DBDRIVER_LIBS = -L/usr/lib -lmysqlclient
...
  • Update changelog (dch) and rebuild package (debuild -us -uc)
Nov 202014
 

As a reminder, how to enable serial console under KVM.

Hypervisor (CentOS 7):
– no changes required if required pty -devices are created automatically (-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 in guest command line)
–  if not found, you need the following bit in the devices section of virtual guests XML-file (modifying usually requires a full shutdown-start sequence for the virtual):

<serial type='pty'>
  <target port='0'/>
</serial>
<console type='pty'>
  <target type='serial' port='0'/>
</console>

Guest (Debian 7):
– modify /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet"
#GRUB_TERMINAL=console

->

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0 quiet"
GRUB_TERMINAL=serial
GRUB_SERIAL_COMMAND="serial"

Uncomment the following line from /etc/inittab:

T0:23:respawn:/sbin/getty -L ttyS0 9600 vt100

Run update-grub and reboot virtual machine – now you should be able to use virsh console at the hypervisor.

<edit-16.1.2015>
– Added XML-configuration for serial/console
– Dropped 9600bps speed configuration
</edit>

Oct 172014
 

Idea is to create haproxy -configuration where it’ll try to make sure to always use a working backend server. Usually all the tutorials suggest using haproxy with “option mysql-check” – which is not enough this time, because MariaDB with Galera might answer correctly for this poll but still be unavailable in DONOR -state.

Prerequirements:

  • Working MariaDB Galera Cluster
  • Haproxy installed on the application server
  • xinetd installed on the database server

 

In the application server, we create haproxy frontend which is listening at 127.0.0.1:3306 for database connections, has weighted loadbalancing and is checks database servers in their port 9200, and has an administrative interface (for debug) in port 9600. And always check configuration and scripts, never just copy & paste from the internet.

# cat /etc/haproxy/haproxy.cfg

global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
node APP1
description APP1.macronet.fi
maxconn 40000
spread-checks 3
quiet

defaults
log global
mode tcp
option tcp-smart-accept
option tcp-smart-connect
option dontlognull
option redispatch
timeout check 3500
timeout queue 3500
timeout connect 3500
timeout client 10000
timeout server 10000

userlist STATS
group admin users admin
user admin insecure-password adminpass
user stats insecure-password userpass

listen adminpage
bind *:9600
mode http
stats enable
stats refresh 60s
stats uri /
acl AuthOkay_ReadOnly http_auth(STATS)
acl AuthOkay_Admin http_auth_group(STATS) admin

listen galeracluster
bind 127.0.0.1:3306
mode tcp
balance leastconn
timeout client 60000
timeout server 60000
option tcpka
option httpchk
option allbackups
server DB1 192.168.10.21:3306 check port 9200 weight 128
server DB2 192.168.10.22:3306 check port 9200 weight 64
server DB3 192.168.10.23:3306 check port 9200 weight 32

 

In database server, we need to create a solution which listens in port 9200 and tells HAProxy that it’s OK. Theres quite many scripts already going around the ‘net, this is just one mix and match of ’em.  wsrep_local_state = 4  == SYNCED  == OK!

# cat /opt/galeracheck.sh

#!/bin/bash
#
# This script checks if a mysql server is healthy running on localhost. It will
# return:
# "HTTP/1.x 200 OK\r" (if mysql is running smoothly)
# - OR -
# "HTTP/1.x 500 Internal Server Error\r" (else)
#
# The purpose of this script is make haproxy capable of monitoring mysql properly
#

MYSQL_HOST="127.0.0.1"
MYSQL_PORT="3306"
MYSQL_USERNAME="HAProxy"
MYSQL_PASSWORD="HAProxyPassword"
MYSQL_OPTS="-N -q -A"
TMP_FILE="/tmp/mysqlchk.$$.out"
ERR_FILE="/tmp/mysqlchk.$$.err"
FORCE_FAIL="/tmp/proxyoff"
MYSQL_BIN="/usr/bin/mysql"
CHECK_QUERY="show global status where variable_name='wsrep_local_state'"
preflight_check()
{
for I in "$TMP_FILE" "$ERR_FILE"; do
if [ -f "$I" ]; then
if [ ! -w $I ]; then
echo -e "HTTP/1.1 503 Service Unavailable\r\n"
echo -e "Content-Type: Content-Type: text/plain\r\n"
echo -e "\r\n"
echo -e "Cannot write to $I\r\n"
echo -e "\r\n"
exit 1
fi
fi
done
}
return_ok()
{
echo -e "HTTP/1.1 200 OK\r\n"
echo -e "Content-Type: text/html\r\n"
echo -e "Content-Length: 43\r\n"
echo -e "\r\n"
echo -e "<html><body>MariaDB Galera is OK!</body></html>\r\n"
echo -e "\r\n"
rm $ERR_FILE $TMP_FILE
exit 0
}
return_fail()
{
echo -e "HTTP/1.1 503 Service Unavailable\r\n"
echo -e "Content-Type: text/html\r\n"
echo -e "Content-Length: 42\r\n"
echo -e "\r\n"
echo -e "<html><body>MariaDB Galera is *down*!</body></html>\r\n"
sed -e 's/\n$/\r\n/' $ERR_FILE
echo -e "\r\n"
rm $ERR_FILE $TMP_FILE
exit 1
}
preflight_check
if [ -f "$FORCE_FAIL" ]; then
echo "$FORCE_FAIL found" > $ERR_FILE
return_fail;
fi
$MYSQL_BIN $MYSQL_OPTS --host=$MYSQL_HOST --port=$MYSQL_PORT --user=$MYSQL_USERNAME --password=$MYSQL_PASSWORD -e "$CHECK_QUERY" > $TMP_FILE 2> $ERR_FILE
if [ $? -ne 0 ]; then
return_fail;
fi
status=`cat $TMP_FILE | awk '{print $2;}'`

if [ $status -ne 4 ]; then
return_fail;
fi

return_ok;

 

As you must have noticed, it needs permissions on the database server:

MariaDB [(none)]> GRANT USAGE *.* to 'HAProxy' IDENTIFIED BY 'HAProxyPassword';

 

To make this listen in a port, use something like the following xinetd -service:

# cat /etc/xinetd.d/galeracheck
# default: on
# description: galeracheck
service galeracheck
{
flags = REUSE
socket_type = stream
port = 9200
wait = no
user = nobody
server = /opt/galeracheck.sh
log_on_failure += USERID
disable = no
# only_from = 0.0.0.0/0
# recommended to put the IPs that need to connect exclusively (security purposes)
per_source = UNLIMITED
}

 

And you’ll need to add the service to /etc/services and restart xinetd:

echo "galeracheck 9200/tcp # Galera clustercheck" >> /etc/services

 

After all of this, your result should be something like this:

# nc 192.168.10.21 9200
HTTP/1.1 200 OK

Content-Type: text/html

Content-Length: 43



<html><body>MariaDB Galera is OK!</body></html>

And your application should be able to query the database backend successfully through HAProxy.

Oct 162014
 

SSLv3 is currently disabled in all of our services, if you’re still using Windows XP and Internet Explorer 6: sorry

Apache (grep -i SSLProtocol -R /etc/apache2/*):
SSLProtocol all -SSLv2
->
SSLProtocol all -SSLv2 -SSLv3

Nginx (grep -i ssl_protocols -R /etc/nginx/*) :
ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
->
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

Postfix (grep -i sslv -R /etc/postfix/* – if not found, add):
smtpd_tls_mandatory_protocols=!SSLv2
->
smtpd_tls_mandatory_protocols=!SSLv2,!SSLv3

Dovecot (grep -i sslv -R /etc/dovecot/* – might be commented by default):
ssl_protocols = !SSLv2
->
ssl_protocols = !SSLv2 !SSLv3

HAProxy v1.5 (add to your bind :443 -line):
no-sslv3

Oct 142014
 

Usually one of the first things we want to do with a new server is to restrict access to SSH -service.

So far it seems that everyone advices “disable firewallD, install iptables service and use it like you’ve always used” but how about trying to get along with this new tech?

Restricting access to SSH isn’t as hard as it might seem at the first glance. First we check what services are allowed in public (usually the default) and internal -zones:

# firewall-cmd --zone=internal --list-services
dhcpv6-client ipp-client mdns samba-client ssh
# firewall-cmd --zone=public --list-services
dhcpv6-client ssh

Then we add our admin-IP to internal -zone:

# firewall-cmd --permanent --zone=internal --add-source=<admin-ip>

Remove access to SSH-service from public:

# firewall-cmd --permanent --zone=public --remove-service=ssh

And reload the changes into use:

# firewall-cmd --reload

–permanent makes changes which stay over reboot/reload, but they aren’t active immediately – without –permanent the changes are active immediately but are lost on reload/reboot

Service definitions can be found (in RHEL/CentOS 7) at /etc/firewalld/services/ – if you create a new one -> use –reload to make it active.

Oct 092014
 

Nowadays LVM has  a cache feature, where we can bolt an SSD as a cache-device to a logical volume.

Let’s imagine we have the following setup:

  •  4x 2TB SATA disks in RAID10 configuration, /dev/md0
  • 2x 120GB SSD disks in RAID1 configuration, /dev/md1

First we’ll create the logical volume which we’ll be working with:

# pvcreate /dev/md0
# vgcreate storage /dev/md0
# lvcreate -n volume -L 4TB storage /dev/md0

Next we’ll bolt the cache-device (which should be RAID1-mirrored in case of disk failure) to the volume, first we’ll extend the volume group to contain the SSD-device:

# vgextend storage /dev/md1

Then we’ll create a cache volume and a metadata volume (there’s 1GB free on purpose):

# lvcreate -n metadata -L 1GB storage /dev/md1
# lvcreate -n cache -L 118GB storage /dev/md1

Now we’ll convert these into a cache pool (this will fail if there isn’t at least the same amount free what’s used for metadata, 1GB, because it’s used for failure recovery):

# lvconvert --type cache-pool --poolmetadata storage/metadata storage/cache

Then all what’s left is attaching the cache to a logical volume:

# lvconvert --type cache --cachepool storage/cache storage/volume

It should say “storage/volume is now cached” and lvs output should look something like this:

# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
cache storage Cwi---C--- 118.00g
volume storage Cwi-a-C--- 4.0t cache [storage_corig]

Oh, and if you want the cache to survive a reboot, youll need a package which provides /usr/sbin/cache_check -binary. In Debian that’s “thin-provisioning-tools”, and in RHEL/CentOS/derivatives the package is device-mapper-persistent-data.
Tests were performed on Debian testing Jessie and CentOS 7.0.1406 Core in 10/2014. Official documentation can be found here.

Sep 172014
 

Note-to-self:

  1. Create hosts to the new instance with items you want to monitor
  2. Map itemids between instances to specify item data you want to copy from the old instance
  3. Copy history(_log,_str,_text, _uint)- and trends(_uint) data, enjoy:
  •  INSERT INTO zabbix2.history (itemid,clock,value,ns) SELECT ‘25188’,clock,value,ns FROM zabbix1.history WHERE itemid=’90090000000025470′;
  • INSERT IGNORE INTO zabbix2.trends (itemid,clock,num,value_min,value_avg,value_max) SELECT ‘25188’,clock,num,value_min,value_avg,value_max FROM zabbix1.trends where itemid=’90090000000025470′;
Jul 172014
 

Hi,

Couple of servers got broken about simultaneously, everything moved to a new platform without any dataloss.

Sorry for a short downtime, keep calm and carry on!

Mar 232014
 

Remember: everything you do, happens on your own risk. Steps below CAN CAUSE DATA LOSS. Do not run in production without understanding what you are doing and testing it in development environment beforehand.
And don’t blame me if it doesn’t work for you.

Before: 5 disks in RAID5 with LUKS -encryption and LVM
After: 7 disks in RAID6 with LUKS -encryption and LVM

RAID details: # mdadm –detail /dev/md0

  1. Add spare drive to RAID5-array:
    # mdadm --add /dev/md0 /dev/sdf
  2. Convert RAID5 to RAID6:
    # mdadm --grow /dev/md0 --level=6 --raid-devices=6 --backup-file=/root/raidbackup
  3. WAIT for the array to rebuild itself (will take a while):
    # cat /prod/mdstat
  4. After rebuild, add another drive to RAID6-array:
    # mdadm --add /dev/md0 /dev/sdg
  5. Take the drive into use (adding makes it a spare):
    # mdadm --grow /dev/md0 --level=6 --raid-devices=7 --backup-file=/root/raidbackup
  6. WAIT for the array to rebuild itself (will take a while):
    # cat /proc/mdstat
  7. Grow the RAID6-array to the maximum size:
    # mdadm --grow /dev/md0 --size=max
  8. Open encrypted disk (if not already open):
    # cryptsetup luksOpen /dev/md0 crypt
  9. Resize encrypted disk:
    # cryptsetup resize /dev/mapper/crypt
  10. Resize LVM physical volume:
    # pvresize /dev/mapper/crypt

And now you have more space what you can manage with LVM.

Note: If i remember correctly, 7. can be combined with 5.:

  • # mdadm --grow /dev/md0 --level=6 --raid-devices=7 --size=max --backup-file=/root/raidbackup

– and resizing could be done while the array is rebuilding, but just to be sure we are on the safe side…