YipitData recently acquired a dataset of Oscar-nominated movies from 1927 - 2014. The data is stored in an S3 bucket (s3://yipit-oscars-data). This repo explains how to extract, transform, and load the data as a csv.
- IDE Dev Environment (VSCode, atom, ect.)
- Create an AWS account (https://portal.aws.amazon.com/billing/signup#/start/email) & setup all account settings (billing, payment methods, ect.)
- Python version 3 or greater is required
- AWS CLI is required
- Access IAM keys and Secret keys & S3 bucket keys (s3://yipit-oscars-data) for AWS connection
- Create an IAM users (deployment display name) in
Add user
and selectProgrammatic access
as the AWS access type. - Grant admin access to users by selecting
AdministratorAccess
- Store the created user
Access key ID
andSecret access key
into a safe location for configuring AWS CLI
- Go to https://www.python.org and find your respective OS download package file (Windows or Mac OS X).
- verify install from machine terminal run
python
orpython3.6
to verify console pops up
- verify install from machine terminal run
- Confirm pip for Python using
pip3.6
vspip
for Python 2.7 - Install AWS CLI using homebrew (Mac OS) or reference https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html for your machines OS.
- Linux
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install
- Linux
- Configure AWS CLI by typing
aws configure
in terminal and input yourregion (us-east-1)
&access keys
given from yipitdata - Install Boto3 library in Python using command
pip3 install boto3
orpip install boto3
- Install the following python libraries
pip3 install boto3 pandas re csv
- Ensure connection to aws-cli so that you bypass permission errors when referencing the s3 bucket
- Execute file
source-code.zip/s3_operations.py
- Once .py files has run, check csv files was created in working directory
- Test aws-cli was installed correctly by typing
which aws
oraws --version
in your env terminal. - Ensure configure credentials are set properly by running command
aws configure
. If correct keys and region show correctly without input, then all is successful