-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reqparams page_num gets overwritten when placing order #188
Comments
Hello @wenjieji86! Thanks for posting the problem you're having.
I would recommend limiting your query object parameters to return only the number of granules you'd like (for instance, your example only returns three granules). In an upcoming release (the pull request #148 is awaiting review and a few other changes being pushed through development), you'll be able to just download one granule by specifying ground tracks and cycles.
You're correct that these parameters are automatically updated in the code. However, they are managing the ordering and download process itself (to make sure that the subsetter goes through and orders all of the possible granules). They do not have any influence over the number of granules returned. The only way to alter the number of granules returned is by changing the scope of your query object (i.e. using narrower dates/times/spatial extents). If it helps, you can create multiple query objects within your code (e.g. |
Hi @JessicaS11 , thank you for your reply. But I don't agree with your comment that:
According to CMR API documentation:
So setting a specific page_num and page_size should give me control over which 'subset' of the available granules I want to order/download. That being said, I think making this change would require some work because the way the current algorithm works is it will order everything from page 1 to
Anyways, I think the current algorithm works fine and like you said, if I want to do a test run of my scripts, I can always limit my search result by using a smaller spatial extent or shorter time interval. Sorry for making things complicated and thanks again for your reply. |
Thanks for clarifying, @wenjieji86. That makes sense; I was unaware that the CMR API does enable this usage, so thanks for sharing that.
I suspect you're correct that this would be a non-trivial change to make in how icepyx handles ordering. One of our outstanding issues (#169) is to look at the new python CMR library to see if we should adopt that under the hood for interfacing with the CMR API. If that ends up happening, the could be a time to think about enabling this functionality. In the meantime, I'm glad there's an easy workaround! |
Hi @JessicaS11 thanks for your reply and I would love to contribute anyways I can. But since I am still learning to use GitHub (e.e. pulling and fork etc.), some instructions on how to contribute to the code repository would be help! At the mean time, I will post my workaround for this issue here in case other people are interested: My understanding of how CMR works with
Here is my code based on the NSIDC-Data-Access-Notebook: # update page_num based on # of avaiable granules and page_size
page_end_tmp = page_start + int(np.ceil(len(granules)/page_size)) - 1
if page_end_tmp < page_end:
tmp_index = page_num.index(page_end_tmp)
page_num_tmp = page_num[:tmp_index + 1]
else:
page_num_tmp = page_num
# options to reset page_size and page_num
print('There will be',len(page_num_tmp),'total order(s) with',page_size,'granules per order: page',page_start,'-',int(page_end_tmp))
page_size_change = input('Change page size? (y/n)')
if page_size_change == 'y':
page_size_order = int(input('Page size:'))
page_start_tmp = page_start
page_end_tmp = page_start_tmp + int(np.ceil(len(granules)/page_size_order)) - 1
page_num_tmp = list(range(page_start_tmp,page_end_tmp+1))
print('There will be',len(page_num_tmp),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_tmp, '-', page_end_tmp)
page_num_change = input('Change page number? (y/n)')
if page_num_change == 'y':
page_start_order = int(input('Start page:'))
page_end_order = int(input('End page:'))
page_num_order = list(range(page_start_order,page_end_order+1))
print('There will be',len(page_num_order),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_order,'-',page_end_order)
else:
page_start_order = page_start_tmp
page_end_order=page_end_tmp
page_num_order=page_num_tmp
print('There will be',len(page_num_order),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_order,'-',page_end_order)
else:
page_size_order = page_size
page_num_change = input('Change page number (y/n)')
if page_num_change == 'y':
page_start_order = int(input('Start page:'))
page_end_order = int(input('End page:'))
page_num_order = list(range(page_start_order,page_end_order+1))
print('There will be',len(page_num_order),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_order,'-',page_end_order)
else:
page_start_order = page_start
page_end_order = page_end_tmp
page_num_order = page_num_tmp
print('There will be',len(page_num_order),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_order,'-',page_end_order) subset_params={
'short_name': short_name,
'version': version,
'temporal': temporal,
'time': temporal,
'polygon': spatial_bound_simplified,
'Boundingshape': spatial_bound_geojson,
'page_size': page_size,
'page_num': page_num_order[0],
'request_mode': request_mode,
}
print('Submitting',len(page_num_order),'total order(s) of',short_name,'for processing...')
# different access methods depending on request mode
if request_mode=='async':
# request data service for each page number, and unzip outputs
for i in range(len(page_num_order)):
page_val=i+1
print('Order:',page_val)
print(subset_params)
request=session.get(base_url,params=subset_params) # for all requests other than spatial file upload, use get function
print('Request HTTP response:',request.status_code)
# raise bad request: loop will stop for bad response code
request.raise_for_status()
esir_root=ET.fromstring(request.content)
# look up order ID
orderlist=[]
for order in esir_root.findall("./order/"):
orderlist.append(order.text)
orderID=orderlist[0]
print('Order ID:',orderID)
# create status URL
statusURL=base_url+'/'+orderID
print('Order status URL:',statusURL)
# find order status
request_response=session.get(statusURL)
print('HTTP response from order response URL:',request_response.status_code)
request_response.raise_for_status()
request_root=ET.fromstring(request_response.content)
statuslist=[]
for status in request_root.findall("./requestStatus/"):
statuslist.append(status.text)
status=statuslist[0]
print('Submitting order',page_val,'...')
print('Order status:',status)
# continue loop while request is still processing
while status=='pending' or status=='processing':
time.sleep(20)
loop_response=session.get(statusURL)
loop_response.raise_for_status()
loop_root=ET.fromstring(loop_response.content)
# find status
statuslist=[]
for status in loop_root.findall("./requestStatus/"):
statuslist.append(status.text)
status=statuslist[0]
print('Order status:',status)
if status=='pending' or status=='processing':
continue
# provide complete_with_errors error message
if status=='complete_with_errors' or status=='failed':
messagelist=[]
for message in loop_root.findall("./processInfo/"):
messagelist.append(message.text)
print('error messages:')
pprint.pprint(messagelist)
# download zipped order if status is complete or complete_with_errors
if status=='complete' or status=='complete_with_errors':
downloadURL='https://n5eil02u.ecs.nsidc.org/esir/'+orderID+'.zip'
print('Zip download URL: ',downloadURL)
print('Begin downloading order',page_val,'output...')
zip_response=session.get(downloadURL)
zip_response.raise_for_status()
with zipfile.ZipFile(io.BytesIO(zip_response.content)) as z:
z.extractall(data_folder)
print('Order',page_val,'is complete.')
else:
print('Order failed.')
subset_params['page_num']+=1 # update page_num
else:
for i in range(len(page_num_order)):
page_val=i+1
print('Order:',page_val)
print('Requesting...')
request=session.get(base_url,params=subset_params)
print('HTTP response from order response URL:',request.status_code)
request.raise_for_status()
d=request.headers['content-disposition']
fname=re.findall('filename=(.+)',d)
dirname=os.path.join(data_folder,fname[0].strip('\"'))
print('Downloading...')
open(dirname,'wb').write(request.content)
print('Order',page_val,'is complete.')
subset_params['page_num']+=1 # update page_num |
@wenjieji86 Apologies for not following up with more guidance for how to contribute. We're working on improving our resources for new contributors so I don't have to follow up manually beyond what's on our docs pages, and I'm trying to get better about staying on top of open issues. In the meantime, some changes introduced in #87 made it super easy to enable single-page ordering as previously discussed in this issue, so I'm going to link this issue to that PR which will automatically close the issue when it's successfully reviewed and then merged. I'd love to have your feedback on how it works after the updates! |
A bit of background: the total number of granules in my query result are usually >400 since I am interested in a fairly big geographic region. However, instead of ordering all of the 400+ granules, I would like to order and download 1 granule at a time so I can write/test my code. I tried to achieve this by setting both 'page_size' and 'page_num' to 1 in my query. But
order_granules
will still order all available granules despite the 'page_size' and 'page_num' values.I think this is because the 'page_nun' gets overwritten due to these lines of codes (in
granules.place_order
ln#245-247):self.get_avail(
CMRparams, reqparams
) # this way the reqparams['page_num'] is updated
where 'page_num' is updated in
granules.get_avail
by these codes (ln#171-173)# Collect results and increment page_num
self.avail.extend(results["feed"]["entry"])
reqparams["page_num"] += 1
I understand that these are probably designed to automatically calculate/update 'page_num' so the users don't have to worry about it. But in my case, I would like to have more control and I think this can be solved by using a copy of 'page_num' in the while loop in
granules.get_avail
instead of updating the original value? I am fairly new to python so please let me know if my understanding is incorrect.Thank you.
The text was updated successfully, but these errors were encountered: