Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a remote testing program #3889

Closed

Conversation

dmitry-ganyushin
Copy link
Contributor

A testing program to read 1D, 3D arrays and slices from 3D arrays. In the current state...

{
if (!variables_in.empty() && var_name.first != variables_in[0])
continue;
adios2::Variable<double> var = io.InquireVariable<double>(var_name.first);
Copy link
Member

@eisenhauer eisenhauer Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InquireVariable will return NULL if the variable exists but isn't of type double. You should probably check for this possibility rather than segfaulting on the next line.

Actually, why do AvailableVariables if you're only matching a single name and requiring it to be a double? Just call InquireVariable with the name you want. No reason to have this loop here at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, read functions should be generic, but currently supports only one type. The loop was made mainly for future development, for matching multiple variables from the input list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but I don't think you'd ever actually do it that way with multiple variables. Those would pretty naturally come in as a list that you iterated over and do InqVar on, which would be vastly easier than iterating through all the variables and then seeing if each one was listed. Alternatively, if you actually supported types you'd call VariableType first to see if it existed and then a templated InqVar to get the actual handle. So, I don't think you'd ever use this loop structure...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the expression
adios2::Variable<double> var = io.InquireVariable<double>(var_name.first);
var cannot return nullptr or NULL. Do we have a test for that? Maybe things got broken.
The compiler does not understand if (var != nullptr)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because var is not a pointer here so you can't compare it to null. But the Variable class does have a boolean operator, so you can do if (var) instead of if (var != nullptr). That's probably the test that you want (and it does appear in many, many places in our tests).

@vicentebolea vicentebolea removed their request for review November 2, 2023 01:51
Copy link
Member

@eisenhauer eisenhauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the adios_iotest directory may not a good place for this. adios_iotest is a complex tool, will this fit in there in some way? Or was this placement another WIP? I guess one way that it might fit in was if adios_iotest could be used to generate the data upon which the read test might rely. I know this has been used with data generated elsewhere, but to actually be useful as some kind of standard performance test that is deployed in a variety of situations (which was the goal), then it needs data, even if its dummy data.

Copy link
Contributor

@anagainaru anagainaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a testing for the CI or just a tool for people to use? If it's for manual performance testing, should we use TAU for performance measurements in addition to the time you are plotting for the entire test?

source/utils/adios_iotest/adios_remote_read.cpp Outdated Show resolved Hide resolved
adios2::Engine reader = io.Open(filename, adios2::Mode::Read);

std::string out;
for (size_t step = 0; step < NSTEPS; step++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want to fix the number of steps since you could deduce that from the file. Or is this not true for remote access?

Suggested change
for (size_t step = 0; step < NSTEPS; step++)
for (; reader.BeginStep() == adios2::StepStatus::OK; ++step)
{

Would this work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of steps should be an input parameter, and, if missing, all steps in the BP files should be read.
Unfortunately the C++ function which retrieves the number of steps (like adios2_steps) is missing. We can change it like that and quite the loop on NSTEPS counter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it isn't missing, but that still doesn't mean you should use it. Engine::Steps() can only return useful information for file engines because streaming engines don't know when the stream will end until the writer closes it. (Generally I'd be surprised to find anything missing from the C++ interface that is present in others because most things are implemented in terms of C++.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number of steps written is lower than your input variable I think this code would crash so I would still use what is returned by BeginStep in the for and break is you reached the desired number of steps.

Comment on lines 215 to 222
startX = (1 - ratio) / 2.0 * globalSizeX;
countX = globalSizeX * ratio;

startY = (1 - ratio) / 2.0 * globalSizeY;
countY = globalSizeY * ratio;

startZ = (1 - ratio) / 2.0 * globalSizeZ;
countZ = globalSizeZ * ratio;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused by what the ratio is (it might be good to add a comment below where you introduce all the parameters for the testing). I was assuming you want to control the amount of data you request so for a ratio of 0.5 the count is half in each direction. The start seems to be 1/4 of the entire direction. Is this just a random way of having a subselection or it represents something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratio parameter is specified as an input parameter. The initial requirement was to implement test cases from the paper https://dl.acm.org/doi/10.1145/1996130.1996139 figure 3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, this makes sense. I would use the phrasing of the paper to describe each case, for this case the paper says: "An arbitrary area on an orthogonal plane"

vars_intersection = available_variables_str;
}

for (auto const &var_name : vars_intersection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just parse over the required_variables?

Copy link
Contributor Author

@dmitry-ganyushin dmitry-ganyushin Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case that --variables parameter is not specified, It think the program should try to read all available variables.
Additionally, input parameters not always correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If input parameters are incorrect, the inquire variable function will return a null variable and you check for this so you would skip this variable anyway. So I would ask for available variables only if the provided variables is empty (I would also add a comment to be clear how the function should behave for any valid input). There is no point to do the intersection, it just makes the code harder to read.

Comment on lines 238 to 269
startX = (1 - ratio) / 2.0 * globalSizeX;
countX = globalSizeX * ratio;

startY = (1 - ratio) / 2.0 * globalSizeY;
countY = globalSizeY * ratio;

startZ = (1 - ratio) / 2.0 * globalSizeZ;
countZ = globalSizeZ * ratio;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is this ratio ? (maybe you should add a comment to explain below where you define the variables)
It seems you want to control the amount you are reading (ratio of the total data on that axes). My concern is with the start. For ratio of 0.5 you are bringing half the data starting from position globalSize/4
Is this just a random start point or it represents something?

Copy link
Contributor Author

@dmitry-ganyushin dmitry-ganyushin Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the ratio parameter corresponds to the amount of data that should be read. Probably it is the most short description of the task. Specifying starts and counts for every MPI rank is not really convenient for the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now that I looked at the paper it makes sense. My confusion was with the start point, but it's just an arbitrary choice.

}
}

void read3DPlane(int nproc, int rank, const std::string &filename, const int nsteps, adios2::IO &io,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add a comment for each of these functions so that it's clear what you are testing. In this case it seems you are bringing the XZ plane (with X data divided across ranks) and only a fixed ration of the Z data (using the weird start equation as starting point)


option longopts[] = {{"help", no_argument, NULL, 'h'},
{"case", required_argument, NULL, 'c'},
{"filename", required_argument, NULL, 'f'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you create the data as part of the test or where do you expect to get the input file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be actually used for reading data generated by adios_iotest program or any other data file, say generated by the S3D program.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, at some point we should add documentation on how to use this and examples of how you can generate the data for readthedocs

@anagainaru
Copy link
Contributor

In the adios_iotest directory may not a good place for this. adios_iotest is a complex tool, will this fit in there in some way? Or was this placement another WIP? I guess one way that it might fit in was if adios_iotest could be used to generate the data upon which the read test might rely. I know this has been used with data generated elsewhere, but to actually be useful as some kind of standard performance test that is deployed in a variety of situations (which was the goal), then it needs data, even if its dummy data.

+1 to Greg's question. I also think we need data for this test.
It is not clear to me how and by whom this will be ran.

@dmitry-ganyushin
Copy link
Contributor Author

Is this a testing for the CI or just a tool for people to use? If it's for manual performance testing, should we use TAU for performance measurements in addition to the time you are plotting for the entire test?

It can be used for automatic testing and hand testing. We never discussed that TAU should be used. I think it brings one more build dependency but probably not necessary for making measurements in 3 places.

@dmitry-ganyushin
Copy link
Contributor Author

In the adios_iotest directory may not a good place for this. adios_iotest is a complex tool, will this fit in there in some way? Or was this placement another WIP? I guess one way that it might fit in was if adios_iotest could be used to generate the data upon which the read test might rely. I know this has been used with data generated elsewhere, but to actually be useful as some kind of standard performance test that is deployed in a variety of situations (which was the goal), then it needs data, even if its dummy data.

+1 to Greg's question. I also think we need data for this test. It is not clear to me how and by whom this will be ran.

I moved it to separated adios_remote_read directory

@anagainaru
Copy link
Contributor

Is this a testing for the CI or just a tool for people to use? If it's for manual performance testing, should we use TAU for performance measurements in addition to the time you are plotting for the entire test?

It can be used for automatic testing and hand testing. We never discussed that TAU should be used. I think it brings one more build dependency but probably not necessary for making measurements in 3 places.

TAU already exists in adios2 so it's not an extra dependency. We don't have to use it but since this is a benchmarking tool, it would be nice having it. We can maybe add it to potential future work.

Revison issues addressed
Revision & Clean-up
@dmitry-ganyushin
Copy link
Contributor Author

Can this be merged now?

@pnorbert pnorbert requested a review from eisenhauer December 11, 2023 16:04
Copy link
Member

@eisenhauer eisenhauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my comments:

  • In this form, it's not really adios_REMOTE_read. it's just a generic reading program (albeit one that only works if you have written type double variables).
  • "Ratio" is applied oddly. If ratio is .5 (implying you want to read half the data), you'll apply the .5 to each dimension. This means that for a 3D array you read 1/8 of the data.
  • I'm not at all clear on how ratio and direction interact, particularly in the nprocs>1 case. It looks like the countD (where D is a direction) values calculated as the result of applying the ratio are overwritten for that Direction (at least for the 3d Case) and instead each processor is just assigned a fraction of the data (1/nproc) in that direction.
  • there's no checking to see if the dimensionality requested actually applies to the variable.
  • I spent an extraordinary amount of time actually trying to get this to do something. If you don't specify --filename it silently doesn't do anything. If you specify a file that doesn't exist, it silently doesn't do anything. If you specify a variable that doesn't exist in the file, it silently doesn't do anything. If you specify a variable that isn't of type double, it silently doesn't do anything. If you specify a case it doesn't recognize, it silently doesn't do anything. If you specify to do a 3D read on a 1D variable, it gives you output, but it's not clear what it did, if anything. It wasn't immediately obvious to me when I first ran it, that it would never produce timing output to the terminal, but would instead write it to a file whose name is based upon the case, but which you can't otherwise control.
  • That timing is essentially done only on reading all specified variables on an entire file seems like a missed opportunty. One could measure open time, how long individual reads took, impact of sync vs. defer, etc.
  • I'll reiterate an earlier comment. I'm still not at all clear what role this program would play in a generalized test suite that might exercise remote read functionality. (To be specific, I know that it could do most anything, but it is maybe one component of a test suite that might include data generation, a performance test plan, etc. Whether or not this is useful as a component, is missing features, etc. is hard to say without any idea of a bigger picture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants