Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for QF_SNIA #144

Merged
merged 5 commits into from
May 16, 2024
Merged

Add support for QF_SNIA #144

merged 5 commits into from
May 16, 2024

Conversation

jurajsic
Copy link
Member

No description provided.

@jurajsic
Copy link
Member Author

# of formulae: 103318
--------------------------------------------------
tool                            ✅    ❌       time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ------  ----  ---------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-f552b5f-d95fe13  102191  1127   14077.39   0.14   0.02   1.52  62776    39415        628   453        46        0
z3-noodler-6802421-d95fe13  102191  1127   12819.51   0.13   0.01   1.59  62783    39408        639   437        51        0
cvc5-1.1.2                   99699  3619   76478.24   0.77   0.00   6.77  60798    38901          2  3608         9        0
z3-4.13.0                    97687  5631  132036.99   1.35   0.01   8.11  58874    38813        210  4960       461        0
--------------------------------------------------

There is slight degradation for the time of solving, but otherwise it looks like there are no problems.

@jurajsic
Copy link
Member Author

For QF_SNIA formulae:

Benchmark snia
# of formulae: 70
--------------------------------------------------
tool                          ✅    ❌    time     avg     med     std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  ------  ------  ------  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-f552b5f-d95fe13    70     0    0.98    0.01    0.01    0.01     70        0          0     0         0        0
cvc5-1.1.2                    70     0    0.00    0.00    0.00    0.00     70        0          0     0         0        0
z3-4.13.0                      0    70    0.00  nan     nan     nan         0        0          0     0         0       70
ostrich-e386836db             70     0  119.74    1.71    1.70    0.20     70        0          0     0         0        0
--------------------------------------------------

Z3 returns correctly sat for all 70 instances, but also prints some error, so it does not show up correctly in the table.

@jurajsic jurajsic marked this pull request as ready for review May 15, 2024 23:25
@jurajsic jurajsic requested a review from vhavlena May 15, 2024 23:25
@vhavlena
Copy link
Collaborator

How come we have now less unknowns but more TOs than before?

@jurajsic
Copy link
Member Author

Benchmark str_small_rw
# of formulae: 1880
--------------------------------------------------
tool                          ✅    ❌     time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-f552b5f-d95fe13  1762   118   341.40   0.19   0.01   2.75      4     1758         92    17         9        0
z3-noodler-6802421-d95fe13  1752   128   399.87   0.23   0.01   3.38      5     1747        103    13        12        0
cvc5-1.1.2                  1861    19    31.09   0.02   0.00   0.56     20     1841          2    17         0        0
z3-4.13.0                   1821    59   214.88   0.12   0.01   2.86     25     1796          0    59         0        0
ostrich-e386836db           1709   171  7478.26   4.38   1.63   6.54     19     1690          0   171         0        0
--------------------------------------------------

Benchmark pyex
# of formulae: 23845
--------------------------------------------------
tool                           ✅     ❌       time    avg    med    std    sat    unsat    unknown     TO    MO+ERR    other
--------------------------  -----  -----  ---------  -----  -----  -----  -----  -------  ---------  -----  --------  -------
z3-noodler-f552b5f-d95fe13  23747     98    4970.47   0.21   0.11   1.56  20377     3370          0     96         2        0
z3-noodler-6802421-d95fe13  23754     91    4734.25   0.20   0.10   1.84  20383     3371          0     89         2        0
cvc5-1.1.2                  23826     19    6384.00   0.27   0.07   2.48  20406     3420          0     19         0        0
z3-4.13.0                   22858    987   54817.37   2.40   0.12  10.23  19587     3271          0    987         0        0
ostrich-e386836db           11017  12828  353972.36  32.13  20.71  35.14   7801     3216          0  12828         0        0
--------------------------------------------------

Benchmark woorpje
# of formulae: 809
--------------------------------------------------
tool                          ✅    ❌     time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-f552b5f-d95fe13   771    38   174.18   0.23   0.02   2.30    609      162          0    21        17        0
z3-noodler-6802421-d95fe13   774    35   252.34   0.33   0.01   4.49    612      162          0    17        18        0
cvc5-1.1.2                   755    54    45.21   0.06   0.01   0.23    591      164          0    54         0        0
z3-4.13.0                    782    27   300.36   0.38   0.02   4.80    618      164          0    27         0        0
ostrich-e386836db            755    54  6876.45   9.11   4.60  15.58    589      166          0    51         0        3
--------------------------------------------------

This three benchmarks have the most different number of solved instances, I would say it's just Z3 randomness.
Here is also scatter plot:
image

@vhavlena
Copy link
Collaborator

You are comparing with a wrong version o Z3-Noodler (you are using the version with length decision procedure)

@jurajsic
Copy link
Member Author

You are comparing with a wrong version o Z3-Noodler (you are using the version with length decision procedure)

Are you sure? According to list_of_tools.txt I am comparing with the version from #140.

@jurajsic
Copy link
Member Author

Hmm, it seems that 6802421 is older than 82db97c in #140, here is comparison with the newer one:

# of formulae: 103318
--------------------------------------------------
tool                            ✅     ❌       time    avg    med    std    sat    unsat    unknown     TO    MO+ERR    other
--------------------------  ------  -----  ---------  -----  -----  -----  -----  -------  ---------  -----  --------  -------
z3-noodler-f552b5f-d95fe13  102191   1127   14077.39   0.14   0.02   1.52  62776    39415        628    453        46        0
z3-noodler-82db97c-d95fe13  102200   1118   12913.53   0.13   0.01   1.64  62783    39417        632    434        52        0
cvc5-1.1.2                   99699   3619   76478.24   0.77   0.00   6.77  60798    38901          2   3608         9        0
z3-4.13.0                    97687   5631  132036.99   1.35   0.01   8.11  58874    38813        210   4960       461        0
ostrich-e386836db            84819  18499  674023.96   7.95   1.62  17.80  46667    38152          0  15197         0     3302
--------------------------------------------------

image

@jurajsic
Copy link
Member Author

I turned off other (unneeded) theories again, like before, the results:

# of formulae: 103318
--------------------------------------------------
tool                            ✅    ❌       time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ------  ----  ---------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-b7b8ad5-d95fe13  102204  1114   12937.44   0.13   0.01   1.63  62783    39421        628   435        51        0
z3-noodler-82db97c-d95fe13  102200  1118   12913.53   0.13   0.01   1.64  62783    39417        632   434        52        0
cvc5-1.1.2                   99699  3619   76478.24   0.77   0.00   6.77  60798    38901          2  3608         9        0
z3-4.13.0                    97687  5631  132036.99   1.35   0.01   8.11  58874    38813        210  4960       461        0
--------------------------------------------------

image

QF_SNIA:

# of formulae: 70
--------------------------------------------------
tool                          ✅    ❌    time     avg     med     std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  ------  ------  ------  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-b7b8ad5-d95fe13    70     0    0.82    0.01    0.01    0.01     70        0          0     0         0        0
cvc5-1.1.2                    70     0    0.00    0.00    0.00    0.00     70        0          0     0         0        0
z3-4.13.0                      0    70    0.00  nan     nan     nan         0        0          0     0         0       70
--------------------------------------------------

Copy link
Collaborator

@vhavlena vhavlena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

@jurajsic jurajsic merged commit e30c1c6 into devel May 16, 2024
@jurajsic jurajsic deleted the snia branch May 16, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants