Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign a table to a TableCollection #1489

Open
benjeffery opened this issue Jun 9, 2021 · 12 comments · Fixed by #2389
Open

Assign a table to a TableCollection #1489

benjeffery opened this issue Jun 9, 2021 · 12 comments · Fixed by #2389
Labels
enhancement New feature or request Python API Issue is about the Python API

Comments

@benjeffery
Copy link
Member

Currently this results in an error:

>>> tc = tskit.TableCollection(1)
>>> tc.individuals=tskit.IndividualTable()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

Is there any reason that it shouldn't be possible? The implementation of this would seem to be clearing the existing table and copying the table data across.

Came up in the context of #1487 (comment)

@benjeffery benjeffery added enhancement New feature or request Python API Issue is about the Python API labels Jun 9, 2021
@petrelharp
Copy link
Contributor

Makes sense to me!

@petrelharp
Copy link
Contributor

petrelharp commented Jun 9, 2021

However, since all these tables refer to each other, it doesn't seem that common that we'd actually have an IndividualTable that makes sense by itself - although I see the counter-example in #1487.

Or, I could imagine a nice way to modify, say, a mutation table would be

muts = tskit.MutationTable()
for mut in tc.mutations:
  muts.add_row( <some modification of mut> )
tc.mutations = muts

... although, now we could do

old_muts = tc.mutations
tc.mutations.clear()
for mut in old_muts:
  tc.mutations.add_row( ... )

@benjeffery
Copy link
Member Author

I agree, we have a lot of code that does the copy-clear-mutate-add/append dance. I prefer making a new table then assigning it as in your first example.

@jeromekelleher
Copy link
Member

jeromekelleher commented Jun 10, 2021

Sure, it makes sense. We would need to be clear that the semantics is this though:

old_table = tc.individuals
tc.individuals = new_table
assert tc.individuals == new_table
assert tc.individuals is not new_table
assert tc.individuals is old_table

The fact that we're not actually assigning the object itself is what put me off this in the past, as that seems a bit counter intuitive and contrary to what you might think of the semantics of assignment. The object ownership stuff holding the tables in a TableCollection is very tricky, and I definitely don't want to go messing with that.

@jeromekelleher
Copy link
Member

I guess another way of doing this would be to have a method replace or something, which does the same thing, which doesn't require us getting into these murky waters:

tc.individuals.replace(new_table)
assert tc.individuals == new_table
assert tc.individuals is not new_table

I'm not sure about the name, but "Replace the contents of this table with the contents of the specified table" seems like a clear definition?

@benjeffery
Copy link
Member Author

benjeffery commented Jun 10, 2021

I agree that it would be better to have this be a named method, rather than doing sneaky modification on assingment. We should add a helpful message to the error on assignment that signposts the named function. How about replace_with?

@benjeffery benjeffery added this to the Python upcoming milestone Jun 10, 2021
@jeromekelleher
Copy link
Member

Yes, that sounds good.

@petrelharp
Copy link
Contributor

If we don't just get to assign with = then I'm not sure there'd be a big advantage over the current way:

old_muts = tc.mutations.copy()
tc.mutations.clear()
for mut in old_muts:
  tc.mutations.add_row( ... )

@jeromekelleher
Copy link
Member

No harm in having two ways of doing it, though. It would make it more natural to pass around tables on their own.

@hyanwong
Copy link
Member

hyanwong commented Jul 7, 2022

Presumably the more efficient way is to set the columns rather than add each row (as this doesn't check the metadata):

import msprime
ts = msprime.simulate(10)
tables = ts.dump_tables()
node_map = np.arange(ts.num_nodes)[::-1]
new_table = ts.tables.nodes[node_map]  # new table with a reversed node order
tables.nodes.set_columns(
    flags=new_table.flags,
    time=new_table.time,
    population=new_table.population,
    individual=new_table.individual,
    metadata=new_table.metadata,
    metadata_offset=new_table.metadata_offset,
)

I think this should be fairly efficient?

@hyanwong
Copy link
Member

hyanwong commented Jul 7, 2022

Actually, should it be as simple as this in the BaseTable class:

    def replace_with(self, other):
        """
        Entirely replace this table with another
        """
        if type(self) != type(other):
            raise ValueError("replacement column is not of the same type")
        tables.nodes.set_columns(
            metadata_schema = repr(other.metadata_schema),  # encode the schema
            **{column: getattr(other, column) for column in self.column_names}
        )

@hyanwong hyanwong mentioned this issue Jul 7, 2022
3 tasks
@mergify mergify bot closed this as completed in #2389 Jul 12, 2022
@benjeffery
Copy link
Member Author

I'm reopening this as we haven't done assignment, but also haven't decided to not do it!

@benjeffery benjeffery reopened this Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Python API Issue is about the Python API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants