AWS : Cassandra on a Single-Node Cluster
Apache Cassandra is a highly scalable NoSQL database system usually running on multi-nodes setup.
It is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. - wiki
Here is the summary for the characteristics of Apache Cassandra:
- One of the most popular structured storage DBMS.
- Uses a hybrid key-value / wide column architecture.
- Originally created Facebook, now open-source.
- Cross-platform support written in Java.
- Supports massively distributed environments.
- Highly scalable and decentralized : all nodes (servers) have the same role which means no single point of failure.
- Automatic data replication : highly fault tolerant which means individual node can fail without downtime.
- Supports MapReduce.
- Used by CERN, Digg, Instagram, Netflix, Reddit, Walmart, Twitter, etc.
- In Cassandra, related data for an application are stored in a container known as a keyspace which contains one or more column families.
In this tutorial, we'll learn how to install and use on a single-node cluster on Ubuntu 14.04.
Cassandra requires that the Java SE Runtime Environment (JRE) be installed: Cassandra 3.0+ requires the most stable version of Java 8.
$ java -version openjdk version "1.8.0_91" OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~14.04-b14) OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
$ echo "deb http://www.apache.org/dist/cassandra/debian 35x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list $ echo "deb-src http://www.apache.org/dist/cassandra/debian 35x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
We need to add three public keys from the Apache Software Foundation:
$ sudo gpg --keyserver pgp.mit.edu --recv-keys F758CE318D77295D $ sudo gpg --export --armor F758CE318D77295D | sudo apt-key add - $ sudo gpg --keyserver pgp.mit.edu --recv-keys 2B5C1B00 $ sudo gpg --export --armor 2B5C1B00 | sudo apt-key add - $ sudo gpg --keyserver pgp.mit.edu --recv-keys 0353B12C $ sudo gpg --export --armor 0353B12C | sudo apt-key add -
Update the package:
$ sudo apt-get update
Now we want to install Cassandra:
$ sudo apt-get install cassandra
We may have a connection issue. So, we should replace <public name> with 127.0.0.1 in /etc/cassandra/cassandra-env.sh:
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=127.0.0.1"
Cassandra should have been started automatically when it's installed, however, it does not. Seems there is a bug.
To confirm that it's not running, type:
$ sudo service cassandra status * could not access pidfile for Cassandra
To start:
$ sudo /etc/init.d/cassandra start $ sudo service cassandra status * Cassandra is running
Installation created the following directories.
- /var/lib/cassandra (data directories)
- /var/log/cassandra (log directory)
- /var/run/cassandra (runtime files)
- /usr/share/cassandra (environment settings)
- /usr/share/cassandra/lib (JAR files)
- /usr/bin (binary files)
- /usr/sbin
- /etc/cassandra (configuration files)
- /etc/init.d (service startup script)
- /etc/security/limits.d (cassandra user limits)
- /etc/default
Start cqlsh using the command cqlsh as shown below.
It gives the Cassandra cqlsh prompt as output:
Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4] Use HELP for help. cqlsh>
We may get the following connection error:
$ cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
To fix the issue, we need to define environment variable CQLSH_NO_BUNDLED and export it:
$ sudo pip install cassandra-driver $ export CQLSH_NO_BUNDLED=true
We installed the latest Python Cassandra driver and tell cqlsh (which is Python program) to use the external Cassandra Python driver, not the one bundled with the distribution.
DevOps
DevOps / Sys Admin Q & A
Linux - system, cmds & shell
- Linux Tips - links, vmstats, rsync
- Linux Tips 2 - ctrl a, curl r, tail -f, umask
- Linux - bash I
- Linux - bash II
- Linux - Uncompressing 7z file
- Linux - sed I (substitution: sed 's///', sed -i)
- Linux - sed II (file spacing, numbering, text conversion and substitution)
- Linux - sed III (selective printing of certain lines, selective definition of certain lines)
- Linux - 7 File types : Regular, Directory, Block file, Character device file, Pipe file, Symbolic link file, and Socket file
- Linux shell programming - introduction
- Linux shell programming - variables and functions (readonly, unset, and functions)
- Linux shell programming - special shell variables
- Linux shell programming : arrays - three different ways of declaring arrays & looping with $*/$@
- Linux shell programming : operations on array
- Linux shell programming : variables & commands substitution
- Linux shell programming : metacharacters & quotes
- Linux shell programming : input/output redirection & here document
- Linux shell programming : loop control - for, while, break, and break n
- Linux shell programming : string
- Linux shell programming : for-loop
- Linux shell programming : if/elif/else/fi
- Linux shell programming : Test
- Managing User Account - useradd, usermod, and userdel
- Linux Secure Shell (SSH) I : key generation, private key and public key
- Linux Secure Shell (SSH) II : ssh-agent & scp
- Linux Secure Shell (SSH) III : SSH Tunnel as Proxy - Dynamic Port Forwarding (SOCKS Proxy)
- Linux Secure Shell (SSH) IV : Local port forwarding (outgoing ssh tunnel)
- Linux Secure Shell (SSH) V : Reverse SSH Tunnel (remote port forwarding / incoming ssh tunnel) /)
- Linux Processes and Signals
- Linux Drivers 1
- tcpdump
- Linux Debugging using gdb
- Embedded Systems Programming I - Introduction
- Embedded Systems Programming II - gcc ARM Toolchain and Simple Code on Ubuntu/Fedora
- LXC (Linux Container) Install and Run
- Linux IPTables
- Hadoop - 1. Setting up on Ubuntu for Single-Node Cluster
- Hadoop - 2. Runing on Ubuntu for Single-Node Cluster
- ownCloud 7 install
- Ubuntu 14.04 guest on Mac OSX host using VirtualBox I
- Ubuntu 14.04 guest on Mac OSX host using VirtualBox II
- Windows 8 guest on Mac OSX host using VirtualBox I
- Ubuntu Package Management System (apt-get vs dpkg)
- RPM Packaging
- How to Make a Self-Signed SSL Certificate
- Linux Q & A
- DevOps / Sys Admin questions
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization