Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tag_counter example failing on EMR #9

Open
umerebryx opened this issue Jul 24, 2016 · 0 comments
Open

tag_counter example failing on EMR #9

umerebryx opened this issue Jul 24, 2016 · 0 comments

Comments

@umerebryx
Copy link

I am trying to run tag_counter example on a EMR cluster. I am running into following error:

python tag_counter.py -r emr s3://crawlimages/input/input/test-1.warc --conf-path mrjob.conf --no-output
Using s3://mrjob-de53a6f5e25deffa/tmp/ as our temp dir on S3
Creating temp directory /tmp/tag_counter.ebryx.20160724.105200.757543
Copying local files to s3://mrjob-de53a6f5e25deffa/tmp/tag_counter.ebryx.20160724.105200.757543/files/...
Created new cluster j-2GOGDCNPYW5K2
Waiting for step 1 of 1 (s-2EMPXS3LSAS25) to complete...
PENDING (cluster is STARTING)
PENDING (cluster is STARTING)
PENDING (cluster is STARTING)
PENDING (cluster is STARTING)
PENDING (cluster is STARTING)
PENDING (cluster is STARTING)
PENDING (cluster is STARTING)
PENDING (cluster is STARTING)
PENDING (cluster is STARTING: Configuring cluster software)
PENDING (cluster is STARTING: Configuring cluster software)
PENDING (cluster is BOOTSTRAPPING: Running bootstrap actions)
PENDING (cluster is BOOTSTRAPPING: Running bootstrap actions)
PENDING (cluster is BOOTSTRAPPING: Running bootstrap actions)
RUNNING for 19.3s
RUNNING for 51.9s
RUNNING for 84.9s
RUNNING for 117.9s
RUNNING for 151.0s
RUNNING for 183.9s
FAILED
Cluster j-2GOGDCNPYW5K2 is TERMINATING: Shut down as step failed
Attempting to fetch counters from logs...
Waiting for cluster (j-2GOGDCNPYW5K2) to terminate...
TERMINATING
TERMINATING
TERMINATING
TERMINATED_WITH_ERRORS
Looking for step log in s3://mrjob-de53a6f5e25deffa/tmp/logs/j-2GOGDCNPYW5K2/steps/s-2EMPXS3LSAS25...
Looking for step log in s3://mrjob-de53a6f5e25deffa/tmp/logs/j-2GOGDCNPYW5K2/steps/s-2EMPXS3LSAS25...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Looking for step log in s3://mrjob-de53a6f5e25deffa/tmp/logs/j-2GOGDCNPYW5K2/steps/s-2EMPXS3LSAS25...
Looking for step log in s3://mrjob-de53a6f5e25deffa/tmp/logs/j-2GOGDCNPYW5K2/steps/s-2EMPXS3LSAS25...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed

Terminating cluster: j-2GOGDCNPYW5K2

And there is nothing in the logs present in bucket to suggest what went wrong.

My mrjob.conf file is:

runners:
emr:
aws_region: us-west-1
# Either set the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
# or set the two variables below
aws_access_key_id: xxxxxx
aws_secret_access_key: xxxxx
# For more control, it's highly recommended to add your key pair
#ec2_key_pair:
#ec2_key_pair_file:
#ssh_tunnel_to_job_tracker: true

ec2_instance_type: m1.xlarge
ec2_master_instance_type: m1.xlarge
ec2_master_instance_bid_price: '0.1'
ec2_core_instance_bid_price: '0.1'
num_ec2_instances: 2

# EMR comes with Python 2.6 by default -- installing Python 2.7 takes a while but might be necessary
# We also install packages needed for streaming compressed files from S3 or reading WARC files
# There's a newer AMI version but it has issues with the released stable mrjob
ami_version: 3.0.4
interpreter: python2.7
bootstrap:
- sudo yum --releasever=2014.09 install -y python27 python27-devel gcc-c++
- sudo python2.7 get-pip.py#
- sudo pip2.7 install boto mrjob simplejson warc
- sudo pip2.7 install https://github.com/commoncrawl/gzipstream/archive/master.zip

Any help solving this would be really appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant