William Liu

AWS Elastic Map Reduce (EMR) and S3

Amazon EMR is a managed Hadoop framework that makes it easy and fast to do mass data processing. EMR is great for handling tasks like log analysis, web indexing, data transformations (ETL).

S3

Check if you need:

S3 Filtering

Remember that S3 doesn’t know about subfolders/subdirectories. A slash is just a slash. There’s only one level of folders called buckets. Inside buckets are files, called objects (aka keys)

Instead, S3 uses a Prefix to filter the list. Only keys with a matching prefix are displayed.

S3 also allows a delimiter with the --delimiter=X or -d (to use / as the delimiter)