Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving MatFIle in v73 requires much more time than v5 with the same data #261

Open
zhengliuer opened this issue Nov 13, 2024 · 4 comments
Open

Comments

@zhengliuer
Copy link

zhengliuer commented Nov 13, 2024

Hi, I am writing data(struct array format) in .mat file using matio(it is actually matio-cpp, cpp wrapper of matio), and I noticed that writing data in v73 requires about 5X time than in v5, is this as expected?

I am trying to give a sample code here, but the code still have some error I have not figured out. But I will put t here anyway. If I find why, I will update it.

#include <iostream>
#include <string>
#include <fstream>

#include <chrono>
#include <matiocpp/matioCpp.h>


int main(int argc, char** argv)
{
    std::string sDataPath = "data.mat";
    std::string sOutputV5 = "save_data_v5.mat";
    std::string sOutputV73 = "save_data_v73.mat";

    matioCpp::File matFile = matioCpp::File::Create(sDataPath, matioCpp::FileVersion::MAT7_3);
    matioCpp::StructArray data = matFile.read("data").asStructArray();


    auto startTime = std::chrono::high_resolution_clock::now();
    // Read data, save as v5
    matioCpp::File matFileV5 = matioCpp::File::Create(sOutputV5, matioCpp::FileVersion::MAT5);
    matFileV5.write(data);

    auto endV5Time = std::chrono::high_resolution_clock::now();
    uint64_t v5Duration_ms = std::chrono::duration_cast<std::chrono::milliseconds>(endV5Time - startTime).count();
    std::cout << "v5 writing cost " << v5Duration_ms << "ms\n";

    // Read data, save as v73
    matioCpp::File matFileV73 = matioCpp::File::Create(sOutputV73, matioCpp::FileVersion::MAT7_3);
    matFileV73.write(data);

    auto endV7Time = std::chrono::high_resolution_clock::now();
    uint64_t v73Duration_ms = std::chrono::duration_cast<std::chrono::milliseconds>(endV7Time - endV5Time).count();
    std::cout << "v73 writing cost " << v73Duration_ms << "ms\n";

}

data.txt
Remember rename the file to data.mat

Best,
Zheng

@tbeu
Copy link
Owner

tbeu commented Nov 18, 2024

@zhengliuer Why is it closed? Is it just no longer relevant?

@tbeu
Copy link
Owner

tbeu commented Nov 23, 2024

Here's a matio native test benchmark

#include <cstdlib>
#include <iostream>
#include <string>
#include <chrono>

#include "matio.h"

int main(int argc, char *argv[])
{
  std::string sDataPath = "data.mat";
  std::string sOutputV5 = "save_data_v5.mat";
  std::string sOutputV73 = "save_data_v73.mat";

  const auto matFileR = Mat_Open(sDataPath.c_str(), MAT_ACC_RDONLY);
  const auto data = Mat_VarRead(matFileR, "data");

  const auto bench = [&](const std::string& path, mat_ft ver, const char* verStr) {
    const auto startTime = std::chrono::high_resolution_clock::now();
    const auto matFileW = Mat_CreateVer(path.c_str(), nullptr, ver);
    Mat_VarWrite(matFileW, data, MAT_COMPRESSION_NONE);
    const auto endTime = std::chrono::high_resolution_clock::now();
    const auto duration_ms = std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
    Mat_Close(matFileW);
    std::cout << verStr << " writing cost " << duration_ms << "ms\n";
  };

  bench(sOutputV5, MAT_FT_MAT5, "v5");
  bench(sOutputV73, MAT_FT_MAT73, "v73");

  Mat_VarFree(data);
  Mat_Close(matFileR);
}

I can confirm the performance degradtion with v73

v5 writing cost 56ms
v73 writing cost 337ms

@zhengliuer zhengliuer reopened this Nov 26, 2024
@zhengliuer
Copy link
Author

@zhengliuer Why is it closed? Is it just no longer relevant?

Because I was occupied by some other things before, and I couldn't find the bug of my code. Now I have time for this.

@zhengliuer
Copy link
Author

zhengliuer commented Nov 26, 2024

The test result had a similar performance degradation with my program. For v73 using HDF5 format, I am not sure if this is the reason that costs much more time than v5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants