Java Data Structures

There are a number of commonly used Java data structures. Here are links to additional documentation and highlights of structural aspects: 

  • Array: fixed size, best for primitives 
  • List: fixed size, interface, ordered, allows duplicate
  • ArrayList: resizable , add to top, preferred unless need the capabilities of another structure
  • Vector: resizable, synchronized, add to top
  • Stack: LIFO, push/pop
  • Queue: FIFO, add to bottom/remove from top, interface
  • LinkedList: doubly-linked, push/pop, many other methods
  • Set: no duplicates, interface
  • HashMap: random access
  • TreeMap: sorted, efficient search
  • Binary Tree: each node has <=2 children
  • Binary Search Tree: sorted, fast lookup/add/remove
  • k-d Tree: k dimensional, space-partitioning, nearest neighbor search
  • Red-Black Tree: self-balancing, binary search tree, extra bit per node
  • Iterator: move forward and backward through data
  • Collections: perform operations on collections of data
  • Working with Java Lists: methods, iterators, collections
  • Data Structure Decisions: change, access, indexing, sizing
  • Common Java Array Manipulation Constructs: split, sort, binary search, iteration
  • Common Java ArrayList Manipulation Constructs: add, indexOf, remove, sort, binary search, interator
  • B-tree and Binary Search Tree Data Structures: like TreeMap
  • Sorting Algorithms: Collections sort, exchange, hybrid, insertion, merge
  • Big O Notation: O(1), O(log (n)), O(n), O(n log(n))
  • Update and Access Overview: The diagram below indicates how elements and values are updated and accessed. This is a broad overview of commonly used data structure classes, interfaces and objects ... please see the links above and below for more details. Some general observations are:
    • Arrays are different than the other structures in that they are accessed using bracketed index [n] references, such as myStringArray[3] instead of methods and can only contain primitives, not objects.
    • ArrayLists are generally preferred over Lists as they allow resizing. It's a very commonly used structure.
    • Vectors are much like ArrayLists with the additional characteristic that they are synchronized.
    • Queues and Stacks are specialized for LIFO and FIFO access.
    • LinkedLists are powerful structures that require more space for storage and time to execute some operations but are efficient for inserting and deleting objects in the existing order.
    • Sets are simple structures that don't allow duplicates.
    • HashMaps are commonly used when rapid, direct access to elements is the primary requirement.
    • TreeMaps are preferred when rapid search is the primary requirement.
    • Iterators and Collections operate on whole data structures.
    • ( ) empty parameter methods generally operate on the ends of a structure.
    • (element | object) only parameter methods generally operate on sets or the ends of structures.
    • (index | key) and (index | key, element | object) parameter methods generally operate on specific places within a structure.

Common Java ArrayList Manipulation Constructs

ArrayList data structures (also called Dynamic Arrays) are a frequently used programming function. ArrayLists have an advantage over arrays - ArrayLists are not fixed in size and can be dynamically changed by adding and deleting elements. Below are some common Java array manipulation constructs. For an overview of working with Java lists, look here.

Creating a new ArrayList

Creating a new ArrayList is coded as shown below using the String type with an initial capacity of 20 entries:

ArrayList mArrayList = new ArrayList<String>(int 20);

Constructing an ArrayList from an Existing List

The example below shows how to construct an ArrayList from the individual words in a String using an intermediary List object. Notice that the sequence of objects created is: String -> String array -> List -> ArrayList

String wordString = "this is a string of words";
String[] wordsArray = wordString.split(" ");
List<String> wordsList = Arrays.asList(wordsArray);
ArrayList wordsArrayList = new ArrayList<String>(wordsList);

add( )

The example below shows how to use the ArrayList add() method to add a String to wordsArrayList::

wordsArrayList.add("Example2");

indexOf( )

The example below shows how to use the indexOf() ArrayList method find the index of an entry:

int index = wordsArrayList.indexOf("Example2");

remove( )

The example below shows how to use the ArrayList remove() method using the entry index or the entry itself:

wordsArrayList.remove(index);
wordsArrayList.remove("Example2");

sort( )

The Collections sort() methods provide a variety of options for sorting ArrayList entries. Note that the ArrayList is sorted in place without returning a new ArrayList. This can be done because the array parameter is a value pointing to the ArrayList of objects. For example, the following statement would sort the words ArrayList from the example above using a modified MergeSort algorithm in an average O(n log(n)) time performance:

Collections.sort(wordsArrayList);

binarySearch( )

The Collections binarySearch() methods provide an efficient way to find an entry in a sorted ArrayList. For example, the following statement would find the index of the wordsArrayList entry for the String Example1 in an average of O(log(n)) time performance:

int entryIndex = Collections.binarySearch(wordsArrayList, "Example1");

iterator( )

The Iterator Class and ArrayList interator() method can be used to iterate over the entries of an ArrayList as shown in the example below :

Iterator mIterator = wordsArrayList.iterator();
while(mIterator.hasNext()){
System.out.println(mIterator.next());
};

Common Java Array Manipulation Constructs

Working with array data structures is one of the most frequently used programming functions. Arrays are implemented in a number of Java classes, such as String, Integer, and Boolean. Those arrays can be manipulated using the Java Arrays class (note that the Arrays class name is a plural). Below are some common Java array manipulation constructs.

split( )

The String split() method separates string characters into array entries using a supplied regular expression. For example, the following statement would create entries in the String array wordsArray using a blank character as a regular expression separator for processing the String wordString:

String wordString = "this is a string of words";
String[] wordsArray = wordString.split(" ");

sort( )

The Arrays sort() methods provide a variety of options for sorting array entries. Note that the array is sorted in place without returning a new array. This can be done because the array parameter is a value pointing to the array of objects. For example, the following statement would sort the words array from the split() example above using a Quicksort process in an average O(n log(n)) time performance:

Arrays.sort(wordsArray);

binarySearch( )

The Arrays binarySearch() methods provide an efficient way to find an entry in a sorted array. For example, the following statement would find the index of the array wordsArray entry for the String entryValue in an average of O(log(n)) time performance:

int entryIndex = Arrays.binarySearch(wordsArray, entryValue);

for( )

The for() construct can be used to iterate over the entries in an array. For example, the following statement would iterate over and print every entry in wordsArray:

for (String word : wordsArray) {System.out.print(word);}

If you want to know and use the index of each entry, then this construct can be used:

for (int i=0; i<wordsArray.length; i++) {System.out.print(wordsArray(i);}

B-tree and Binary Search Tree Data Structures

B-tree and Binary Search Tree data structures are similar but different ways to store data. The advantage of using search trees is that the test for membership can be performed efficiently provided that the tree is reasonably balanced, that is, the leaves of the tree are at comparable depths.

B-tree Implementations

B-tree implementations are normally commercial. Languages don't typically provide direct B-tree support. To find B-tree implementations, search Google for B-tree software.

Binary Search Tree Implementations

Some languages do provide support for Binary Search Trees. In Java, see the TreeMap class, which implements a variant of the Binary Search Tree, the Red-Black Tree.

Binary Tree Data Structure Comparison

Both structures operate in the average in O(log n) time. Note that in the worst case, B-tree, at O(log n), is faster than Binary Search Tree at O(n).

Big O Notation

Big O notation is a way to characterize the time or resources needed to solve a computing problem.  It's particularly useful in comparing various computing algorithms under consideration. Below is a table summarizing Big O functions. The four most commonly referenced and important to remember are:

  • O(1)              Constant access time such as the use of a hash table.
  • O(log n)        Logarithmic access time such as a binary search of a sorted table.
  • O(n)              Linear access time such as the search of an unsorted list.
  • O(n log(n))   Multiple of log(n) access time such as using Quicksort or Mergesort.

Sorting Algorithms

Before delving into the large variety of sorting algorithms, it's important to understand that there are simple ways to perform sorts in most programming languages. For example, in Java, the Collections class contains a sort method that can be implemented as simply as:

Collections.sort(list);

where list is an object such as an ArrayList. Some aspects of the Java sort() method to note are:

  • You can also use the Java Comparator class methods to implement your own list item comparison functions for specialized sorting order needs.
  • The Java Collections class (plural) is different than the Collection class (singular).
  • Which sorting algorithm (see below) is used in the Collections.sort() implementation depends on the implementation approach chosen by the Java language developers. You can find implementation notes for Java Collections.sort() here. It currently uses a modified merge sort that performs in the range of Big O O(n log(n)).

If you don't want to use a built-in sort function and you're going to implement your own sort function, there's a large list of sorting algorithms to choose from.  Factors to consider in choosing a sort algorithm include:

  • Speed of the algorithm for the best, average and worst sort times. Most algorithms have sort times characterized by Big O functions of O(n), O(n log(n)) or O(n^2).
  • The relative importance of the best, average and worst sort times for the sort application.
  • Memory required to perform the sort.
  • Type of data to be sorted (e.g., numbers, strings, documents).
  • The size of the data set to be sorted.
  • The need for sort stability (preserving the original order of duplicate items).
  • Distribution and uniformity of values to be sorted.
  • Complexity of the sort algorithm.

For a comparison of sorting algorithms based on these and other values see: https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms.

Here's a quick reference for the major algorithms:

  • Exchange Sorts: based on swapping items
    • Bubble sort: for each pair of indices, swap the items if out of order, loop for items and list.
    • Cocktail sort: variation of bubble sort, passes alternately from top to bottom and bottom to top.
    • Comb sort: variation of bubble sort, selective swap of values
    • Gnome sort: also called the stupd sort, moves values back to just above a value less than it
    • Odd-even sort: developed originally for use with parallel processors, examines odd-even pairs and orders them, alternates pairs until list is ordered
    • Quicksort: divide list into two, with all items on the first list coming before all items on the second list.; then sort the two lists. Repeat. Often the method of choice. One of the fastest sort algorithms
  • Hybrid Sorts: mixture of sort techniques
    • Flashsort: Used on data sets with a known distribution, estimates used for where an element should be placed
    • Introsort: begin with quicksort and switch to heapsort when the recursion depth exceeds a certain level
    • Timsort: adaptative algorithm derived from merge sort and insertion sort. 
  • Insertion sorts: builds the final sorted array one item at a time
    • Insertion sort: determine where the current item belongs in the list of sorted ones, and insert it there
    • Library sort: like library shelves, space is created for new entries within groups such as first letters, space is removed at end of sort
    • Patience sorting: based on the solitaire card game, uses piles of "cards"
    • Shell sort: an attempt to improve insertion sort
    • Tree sort (binary tree sort): build binary tree, then traverse it to create sorted list
    • Cycle sort: in-place with theoretically optimal number of writes
  • Merge sortstakes advantage of the ease of merging already sorted lists into a new sorted list
    • Merge sort: sort the first and second half of the list separately, then merge the sorted lists
    • Strand sort: repeatedly pulls sorted sublists out of the list to be sorted and merges them with the result array
  • Non-comparison sorts
    • Bead sort: can only be used on positive integers, performed using mechanics like beads on an abacus
    • Bucket sort: works by partitioning an array into a number of buckets, each bucket is then sorted individually using the best technique, the buckets are then merged
    • Burstsort: used for sorting strings, employs growable arrays
    • Counting sort: sorts a collection of objects according to keys that are small integers
    • Pigeonhole sort: suitable for sorting lists of elements where the number of elements and number of possible key values are approximately the same, uses auxiliary arrays for grouping values
    • Postman sort: variant of Bucket sort which takes advantage of hierarchical structure
    • Radix sort: sorts strings letter by letter
  • Selection sorts: in place comparison sorts
    • Heapsort: convert the list into a heap, keep removing the largest element from the heap and adding it to the end of the list
    • Selection sort: pick the smallest of the remaining elements, add it to the end of the sorted list
    • Smoothsort
  • Other
  • Unknown class

 

Android/PHP/JQuery RESTful JSON MySQL Client-Server Overview

The title of this blog entry is quite a mouthful.  The purpose is to give a broad overview of the moving parts necessary to implement an application with client mobile and desktop devices that interact with a server database to alter and retrieve data.

I won't define all the classes, methods and acronyms on the graphic as they're easy to look up using an Internet search.  There are other choices that can be made for the details of the implementation, but this should provide a starting point for thinking through the necessary elements.

The communication vehicle for client-server interaction is typically the Internet Protocol Suite.

 

SQL Database Join Operations

It can be challenging to remember the details of all the SQL Join operations.  A convenient way to do this is to use Venn diagrams shown below. For examples of SQL Joins, go here

Commonly used SQL Join operations are:

  • inner join: requires each record in the two joined tables to have matching records.
  • outer join: does not require each record in the two joined tables to have a matching record.

Object Oriented Programming Data Structure Decisions

When it comes to organizing data in an Object Oriented Program, there are lots of choices and decisions to make. It can at times be difficult to think through all the criteria for selecting and implementing the right data structure.

The decision table below can help work through the data design process.