Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reqparams page_num gets overwritten when placing order #188

Closed
wenjieji86 opened this issue Mar 19, 2021 · 5 comments · Fixed by #87
Closed

reqparams page_num gets overwritten when placing order #188

wenjieji86 opened this issue Mar 19, 2021 · 5 comments · Fixed by #87

Comments

@wenjieji86
Copy link
Contributor

A bit of background: the total number of granules in my query result are usually >400 since I am interested in a fairly big geographic region. However, instead of ordering all of the 400+ granules, I would like to order and download 1 granule at a time so I can write/test my code. I tried to achieve this by setting both 'page_size' and 'page_num' to 1 in my query. But order_granules will still order all available granules despite the 'page_size' and 'page_num' values.

I think this is because the 'page_nun' gets overwritten due to these lines of codes (in granules.place_order ln#245-247):
self.get_avail(
CMRparams, reqparams
) # this way the reqparams['page_num'] is updated

where 'page_num' is updated in granules.get_avail by these codes (ln#171-173)

# Collect results and increment page_num
self.avail.extend(results["feed"]["entry"])
reqparams["page_num"] += 1

I understand that these are probably designed to automatically calculate/update 'page_num' so the users don't have to worry about it. But in my case, I would like to have more control and I think this can be solved by using a copy of 'page_num' in the while loop in granules.get_avail instead of updating the original value? I am fairly new to python so please let me know if my understanding is incorrect.

Thank you.

Capture

@JessicaS11
Copy link
Member

Hello @wenjieji86! Thanks for posting the problem you're having.

I would like to order and download 1 granule at a time so I can write/test my code.

I would recommend limiting your query object parameters to return only the number of granules you'd like (for instance, your example only returns three granules). In an upcoming release (the pull request #148 is awaiting review and a few other changes being pushed through development), you'll be able to just download one granule by specifying ground tracks and cycles.

I tried to achieve this by setting both 'page_size' and 'page_num' to 1 in my query.

You're correct that these parameters are automatically updated in the code. However, they are managing the ordering and download process itself (to make sure that the subsetter goes through and orders all of the possible granules). They do not have any influence over the number of granules returned. The only way to alter the number of granules returned is by changing the scope of your query object (i.e. using narrower dates/times/spatial extents). If it helps, you can create multiple query objects within your code (e.g. my_query_all and my_query_test) to search for all the images you may eventually use and then only actually complete a download with a smaller subset of images for testing.

@wenjieji86
Copy link
Contributor Author

Hi @JessicaS11 , thank you for your reply. But I don't agree with your comment that:

They do not have any influence over the number of granules returned.

According to CMR API documentation:

page_num, defaulting to 1, chooses a "page" of items to return. If a search matched 50 items the parameters page_num=3&page_size=5 would return the 11th item through the 15th item.

So setting a specific page_num and page_size should give me control over which 'subset' of the available granules I want to order/download.

That being said, I think making this change would require some work because the way the current algorithm works is it will order everything from page 1 to page_num instead of just ordering granules in page_num, which is implemented in the granules.place_order():

request_params.update({"page_num": page_val})

Anyways, I think the current algorithm works fine and like you said, if I want to do a test run of my scripts, I can always limit my search result by using a smaller spatial extent or shorter time interval. Sorry for making things complicated and thanks again for your reply.

@JessicaS11
Copy link
Member

JessicaS11 commented Mar 23, 2021

Thanks for clarifying, @wenjieji86. That makes sense; I was unaware that the CMR API does enable this usage, so thanks for sharing that.

I think making this change would require some work because the way the current algorithm works is it will order everything from page 1 to page_num instead of just ordering granules in page_num, which is implemented in the granules.place_order()

I suspect you're correct that this would be a non-trivial change to make in how icepyx handles ordering. One of our outstanding issues (#169) is to look at the new python CMR library to see if we should adopt that under the hood for interfacing with the CMR API. If that ends up happening, the could be a time to think about enabling this functionality. In the meantime, I'm glad there's an easy workaround!

@JessicaS11 JessicaS11 reopened this Mar 23, 2021
@wenjieji86
Copy link
Contributor Author

wenjieji86 commented Mar 23, 2021

Hi @JessicaS11 thanks for your reply and I would love to contribute anyways I can. But since I am still learning to use GitHub (e.e. pulling and fork etc.), some instructions on how to contribute to the code repository would be help!

At the mean time, I will post my workaround for this issue here in case other people are interested:

My understanding of how CMR works with page_num and page_size is that it will return only ONE page (page_num) which will have page_size number of granules in your search result. So when querying available granules with a temporal range and/or spatial extent, the total number of granules returned (say 10 if you set page_size to 10) will be the same whether you set page_num to 1 or 10, but these 10 granules will be different because they are on different pages (page 1 and page 10). But I guess in most cases people will simply want to download all available granules (in which case page_num and page_size don’t really matter). Nevertheless, I think it would still be nice if we can have more control over which "page(s)" of data we want to download, especially when the CMR API seems to be designed in that way.

  • When querying, page_num and page_size are less important because in most cases, users are interested in the total number of available granules over his/her aoi, so no change needed
  • But the users may not want to download/order all the available granules, especially when there are hundreds of available granules over their aoi. Instead they may want to be able to download either a single page or slices of pages (e.g. pages 5-10).
  • This can be realized by giving the users an option to reset page_num and page_size based on the total available granules and instead of settingpage_num parameter, they will be asked to set page_start and page_end

Here is my code based on the NSIDC-Data-Access-Notebook:

# update page_num based on # of avaiable granules and page_size
page_end_tmp = page_start + int(np.ceil(len(granules)/page_size)) - 1

if page_end_tmp < page_end:
    tmp_index = page_num.index(page_end_tmp)
    page_num_tmp = page_num[:tmp_index + 1]
else:
    page_num_tmp = page_num

# options to reset page_size and page_num
print('There will be',len(page_num_tmp),'total order(s) with',page_size,'granules per order: page',page_start,'-',int(page_end_tmp))
page_size_change = input('Change page size? (y/n)')

if page_size_change == 'y':
    page_size_order = int(input('Page size:'))
    page_start_tmp = page_start
    page_end_tmp = page_start_tmp + int(np.ceil(len(granules)/page_size_order)) - 1
    page_num_tmp = list(range(page_start_tmp,page_end_tmp+1))
    print('There will be',len(page_num_tmp),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_tmp, '-', page_end_tmp)
    
    page_num_change = input('Change page number? (y/n)')
    
    if page_num_change == 'y':
        page_start_order = int(input('Start page:'))
        page_end_order = int(input('End page:'))
        page_num_order = list(range(page_start_order,page_end_order+1))
        print('There will be',len(page_num_order),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_order,'-',page_end_order)
    else:
        page_start_order = page_start_tmp
        page_end_order=page_end_tmp
        page_num_order=page_num_tmp
        print('There will be',len(page_num_order),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_order,'-',page_end_order)
else:
    page_size_order = page_size
    page_num_change = input('Change page number (y/n)')
    
    if page_num_change == 'y':
        page_start_order = int(input('Start page:'))
        page_end_order = int(input('End page:'))
        page_num_order = list(range(page_start_order,page_end_order+1))
        print('There will be',len(page_num_order),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_order,'-',page_end_order)
    else:
        page_start_order = page_start
        page_end_order = page_end_tmp
        page_num_order = page_num_tmp
        print('There will be',len(page_num_order),'total order(s) with',page_size_order,'granules per order: page(s)',page_start_order,'-',page_end_order)
subset_params={
    'short_name': short_name,
    'version': version,
    'temporal': temporal,
    'time': temporal,
    'polygon': spatial_bound_simplified,
    'Boundingshape': spatial_bound_geojson,
    'page_size': page_size,
    'page_num': page_num_order[0],
    'request_mode': request_mode,
}

print('Submitting',len(page_num_order),'total order(s) of',short_name,'for processing...')

# different access methods depending on request mode
if request_mode=='async':
    # request data service for each page number, and unzip outputs
    for i in range(len(page_num_order)):
        page_val=i+1
        print('Order:',page_val)
        
        print(subset_params)
        request=session.get(base_url,params=subset_params) # for all requests other than spatial file upload, use get function
        print('Request HTTP response:',request.status_code)
        
        # raise bad request: loop will stop for bad response code
        request.raise_for_status()
        esir_root=ET.fromstring(request.content)
        
        # look up order ID
        orderlist=[]   
        for order in esir_root.findall("./order/"):
            orderlist.append(order.text)
        orderID=orderlist[0]
        print('Order ID:',orderID)
        
        # create status URL
        statusURL=base_url+'/'+orderID
        print('Order status URL:',statusURL)
        
        # find order status
        request_response=session.get(statusURL)    
        print('HTTP response from order response URL:',request_response.status_code)
        
        request_response.raise_for_status()
        request_root=ET.fromstring(request_response.content)
        statuslist=[]
        for status in request_root.findall("./requestStatus/"):
            statuslist.append(status.text)
        status=statuslist[0]
        print('Submitting order',page_val,'...')
        print('Order status:',status)
        
        # continue loop while request is still processing
        while status=='pending' or status=='processing':
            time.sleep(20)
            loop_response=session.get(statusURL)
            loop_response.raise_for_status()
            loop_root=ET.fromstring(loop_response.content)
            
            # find status
            statuslist=[]
            for status in loop_root.findall("./requestStatus/"):
                statuslist.append(status.text)
            status=statuslist[0]
            print('Order status:',status)
            if status=='pending' or status=='processing':
                continue
        
        # provide complete_with_errors error message
        if status=='complete_with_errors' or status=='failed':
            messagelist=[]
            for message in loop_root.findall("./processInfo/"):
                messagelist.append(message.text)
            print('error messages:')
            pprint.pprint(messagelist)
        
        # download zipped order if status is complete or complete_with_errors
        if status=='complete' or status=='complete_with_errors':
            downloadURL='https://n5eil02u.ecs.nsidc.org/esir/'+orderID+'.zip'
            print('Zip download URL: ',downloadURL)
            print('Begin downloading order',page_val,'output...')
            zip_response=session.get(downloadURL)
            zip_response.raise_for_status()
            with zipfile.ZipFile(io.BytesIO(zip_response.content)) as z:
                z.extractall(data_folder)
            print('Order',page_val,'is complete.')
        else:
            print('Order failed.')
        
        subset_params['page_num']+=1 # update page_num
else:
    for i in range(len(page_num_order)):
        page_val=i+1
        print('Order:',page_val)
        print('Requesting...')
        request=session.get(base_url,params=subset_params)
        print('HTTP response from order response URL:',request.status_code)
        request.raise_for_status()
        d=request.headers['content-disposition']
        fname=re.findall('filename=(.+)',d)
        dirname=os.path.join(data_folder,fname[0].strip('\"'))
        print('Downloading...')
        open(dirname,'wb').write(request.content)
        print('Order',page_val,'is complete.')
        
        subset_params['page_num']+=1 # update page_num

@JessicaS11
Copy link
Member

@wenjieji86 Apologies for not following up with more guidance for how to contribute. We're working on improving our resources for new contributors so I don't have to follow up manually beyond what's on our docs pages, and I'm trying to get better about staying on top of open issues.

In the meantime, some changes introduced in #87 made it super easy to enable single-page ordering as previously discussed in this issue, so I'm going to link this issue to that PR which will automatically close the issue when it's successfully reviewed and then merged. I'd love to have your feedback on how it works after the updates!

@JessicaS11 JessicaS11 linked a pull request Feb 3, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants