Using DynamoDB Global Secondary Indexes

DynamoDB

DynamoDB is a schema-less NOSQL storage hosted by AWS, it has auto-scaling, read/write capacity management and alerting controlled using CloudWatch.
When creating a Dynamo table you assign a Key consisting of either a Hash or a Hash and Range, and the read/write capacity (amount of consistent reads and writes that can be performed per second).  When Dynamo was first released there were three different ways of accessing an item, by Key (if a Key has both a Hash and Range then both must be set), Query and Scan.  Accessing an item using the Key requires you to know the exact value of the Hash or the Hash and Range.  Query allows you to perform comparisons against the Range part of the Key, but you must still specify which Hash you want to search.  Scan performs a search across all columns without having to specify the Hash/Range, but in doing so it reads every item in your table, so can be an expensive operation if your table has 100s/1000s of items.

Global Secondary Indexes to the rescue

Recently Dynamo released Global Secondary Indexes, which are additional indexes (max 5 at present) that can be set against any column in your table, the only limitation being they must to be created up front.  Each index always returns the Key from the main table and an optional list of projected attributes (equivalent to a column in SQL).
Dynamo Global Secondary Indexes
When designing your index there are two considerations to make, cost versus performance.  Each index is a copy of the data from the main table and will contain at least an index Key and the table’s Key, plus any projected attributes.  Including all attributes in an index will double the storage cost as there will be duplicate copies of the data, but will not require any additional reads to the table.  An index that only has a Key will be slower as you will need to perform additional reads to the table using the table’s Key from the Query result.  You can of course project a subset of the attributes, but as an index has to be created when the table is created any schema changes i.e. additional attributes, can not be added to an index at a later date.
To Query an index you set the IndexName and define a list of KeyConditions, these are a list of operations that you want to perform against the Key in the specified index, but be warned, you can only perform an equality (EQ) query against a Hash Key, whereas you can perform any operation against a Range Key.

var query = new QueryRequest
{
	TableName = "Ratings",
	IndexName = "RestId-Index",
	KeyConditions = new Dictionary<String, Condition>
	{
		{
		"RestaurantId", new Condition
			{
				ComparisonOperator = "EQ",
				AttributeValueList = {new AttributeValue {N = "1"}}
			}
		}
	},
	Limit = 10,
	ScanIndexForward = true
};

When you Query an index the data is returned as a List of Dictionary<string, AttributeValue>, the string is the name of the attribute and the AttributeValue the value.  Attributes can be three different base types: string (S), number (N), binary (B) or it can be a set of types (SS, NS and BS).  The AttributeValue has a property for each type which makes mapping the data to an entity more difficult and not very DRY.

Query Result Mapping

Here are the steps to create your own generic entity mapper.
1) Get list of public properties from T.

var type = typeof(T);
var ret = new T();
var entityProperties = type.GetProperties(BindingFlags.Instance | BindingFlags.Public);

2) Loop each property in T.
3) Get the attribute name.

private static string GetAttributeName(PropertyInfo property)
{
	var attributeName = property.Name;
	var dynamoDbProperty = property.GetCustomAttributes(typeof (DynamoDBPropertyAttribute), true).SingleOrDefault();
	if (dynamoDbProperty != null)
	{
		var dynamoDbPropertyValue = (DynamoDBPropertyAttribute) dynamoDbProperty;
		if (!string.IsNullOrWhiteSpace(dynamoDbPropertyValue.AttributeName))
		{
			attributeName = dynamoDbPropertyValue.AttributeName;
		}
	}
	return attributeName;
}

4) Find the attribute in the Dictionary.

private static AttributeValue GetAttribute(IDictionary<string, AttributeValue> item, string attributeName)
{
	if (!item.ContainsKey(attributeName))
	{
		return null;
	}
	AttributeValue value;
	return !item.TryGetValue(attributeName, out value) ? null : value;
}

5) If the attribute is not found, no mapping required, so the property’s default will be returned.
6) If the attribute is found cast it to the property’s type.

var propertyType = property.PropertyType;
switch (Type.GetTypeCode(propertyType))
{
	case TypeCode.String:
		property.SetValue(ret, attribute.S, null);
		break;
	case TypeCode.Int32:
		var intValue = ParseInt(attribute, attributeName);
		property.SetValue(ret, intValue, null);
		break;
	case TypeCode.Boolean:
		var booleanValue = ParseBoolean(attribute);
		property.SetValue(ret, booleanValue, null);
		break;
	case TypeCode.DateTime:
		var dateTimeValue = ParseDateTime(attribute, attributeName);
		property.SetValue(ret, dateTimeValue, null);
		break;
	default:
		var attributeValue = GetNonPrimitiveTypeValue(attribute);
		if (attributeValue != null)
		{
			var serializedItem = attributeValue is string
				? attributeValue.ToString()
				: _serializer.Serialize(attributeValue);
			var deserializedItem = _serializer.Deserialize(serializedItem, propertyType);
			if (deserializedItem != null)
			{
				property.SetValue(ret, deserializedItem, null);
			}
		}
		break;
}

To get a non-primitive type value you must check each type in the AttributeValue.

private static object GetNonPrimitiveTypeValue(AttributeValue value)
{
	if (!string.IsNullOrWhiteSpace(value.S))
	{
		return value.S;
	}
	if (!string.IsNullOrWhiteSpace(value.N))
	{
		return value.N;
	}
	if (value.SS != null && value.SS.Any())
	{
		return value.SS;
	}
	if (value.NS != null && value.NS.Any())
	{
		return value.NS;
	}
	return null;
}

IJsonSerializer interface is a wrapper around the Serialization logic.

public interface IJsonSerializer
{
	string Serialize(T item);
	object Deserialize(string value, Type type);
}

This has been tested against String, Integer, Float, DateTime, Boolean, Collection and Non-Primitive Types so it should work against most custom entity classes.
Full class implementation

Conclusion

Dynamo is schema-less so the attributes can be of any type, when you Query an index you will not always get back all of the attributes of an item, only those defined in the projected fields.  In my opinion, it is best to create a different entity/view to map against each index, this will help to avoid attempting to cast properties that do not exist in the index.
Pros:

  • Different read capacity per index so can be tailored depending on how often each index will be accessed
  • Return only a subset of attributes to help performance

 
Cons:

  • Cannot create new indexes or edit existing ones after table creation
  • Limited to only 5 indexes
  • Can only perform equality on Hash Key
  • Custom mapping required, although the code above fixes this limitation

 
Dynamo has come a long way since it was first launched and Global Secondary Indexes are a nice addition that adds some much needed functionality.