Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unite_master_parts breaks topological order of descendants and thus breaks cascading of table drops #1057

Closed
ZhuokunDing opened this issue Sep 30, 2022 · 2 comments · Fixed by #1184
Assignees
Labels

Comments

@ZhuokunDing
Copy link

Bug Report

Description

unite_master_parts does not always preserve topological order when reordering the list of tables and causes issues when dropping tables.

Reproducibility

  • Environment: datajoint codebook (python 3.9.5, datajoint 0.13.8)
  • Minimum steps to reproduce the issue:
import datajoint as dj

schema = dj.Schema('zhuokun_test')

@schema
class A(dj.Manual):
    definition = """
    a: int
    """

@schema
class M1(dj.Manual):
    definition = """
    -> A
    """
    
@schema
class M(dj.Manual):
    definition = """
    -> A
    """
    
    class Part(dj.Part):
        definition = """
        -> master
        -> M1
        """
A.drop()
  • Logs:
[2022-09-30 21:17:15,746][INFO]: Connecting [email protected]:3306
[2022-09-30 21:17:15,864][INFO]: Connected [email protected]:3306
`zhuokun_test`.`a` (0 tuples)
`zhuokun_test`.`m` (0 tuples)
`zhuokun_test`.`m__part` (0 tuples)
`zhuokun_test`.`m1` (0 tuples)
Proceed? [yes, No]:  yes
---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
Input In [1], in <cell line: 28>()
    23     class Part(dj.Part):
    24         definition = """
    25         -> master
    26         -> M1
    27         """
---> 28 A.drop()

File /opt/conda/lib/python3.9/site-packages/datajoint/table.py:646, in Table.drop(self)
   644 if do_drop:
   645     for table in reversed(tables):
--> 646         FreeTable(self.connection, table).drop_quick()
   647     print("Tables dropped.  Restart kernel.")

File /opt/conda/lib/python3.9/site-packages/datajoint/table.py:605, in Table.drop_quick(self)
   603 if self.is_declared:
   604     query = "DROP TABLE %s" % self.full_table_name
--> 605     self.connection.query(query)
   606     logger.info("Dropped table %s" % self.full_table_name)
   607     self._log(query[:255])

File /opt/conda/lib/python3.9/site-packages/datajoint/connection.py:340, in Connection.query(self, query, args, as_dict, suppress_warnings, reconnect)
   338 cursor = self._conn.cursor(cursor=cursor_class)
   339 try:
--> 340     self._execute_query(cursor, query, args, suppress_warnings)
   341 except errors.LostConnectionError:
   342     if not reconnect:

File /opt/conda/lib/python3.9/site-packages/datajoint/connection.py:296, in Connection._execute_query(cursor, query, args, suppress_warnings)
   294         cursor.execute(query, args)
   295 except client.err.Error as err:
--> 296     raise translate_query_error(err, query)

File /opt/conda/lib/python3.9/site-packages/datajoint/connection.py:294, in Connection._execute_query(cursor, query, args, suppress_warnings)
   291         if suppress_warnings:
   292             # suppress all warnings arising from underlying SQL library
   293             warnings.simplefilter("ignore")
--> 294         cursor.execute(query, args)
   295 except client.err.Error as err:
   296     raise translate_query_error(err, query)

File /opt/conda/lib/python3.9/site-packages/pymysql/cursors.py:148, in Cursor.execute(self, query, args)
   144     pass
   146 query = self.mogrify(query, args)
--> 148 result = self._query(query)
   149 self._executed = query
   150 return result

File /opt/conda/lib/python3.9/site-packages/pymysql/cursors.py:310, in Cursor._query(self, q)
   308 self._last_executed = q
   309 self._clear_result()
--> 310 conn.query(q)
   311 self._do_get_result()
   312 return self.rowcount

File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:548, in Connection.query(self, sql, unbuffered)
   546     sql = sql.encode(self.encoding, "surrogateescape")
   547 self._execute_command(COMMAND.COM_QUERY, sql)
--> 548 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
   549 return self._affected_rows

File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:775, in Connection._read_query_result(self, unbuffered)
   773 else:
   774     result = MySQLResult(self)
--> 775     result.read()
   776 self._result = result
   777 if result.server_status is not None:

File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:1156, in MySQLResult.read(self)
  1154 def read(self):
  1155     try:
-> 1156         first_packet = self.connection._read_packet()
  1158         if first_packet.is_ok_packet():
  1159             self._read_ok_packet(first_packet)

File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:725, in Connection._read_packet(self, packet_type)
   723     if self._result is not None and self._result.unbuffered_active is True:
   724         self._result.unbuffered_active = False
--> 725     packet.raise_for_error()
   726 return packet

File /opt/conda/lib/python3.9/site-packages/pymysql/protocol.py:221, in MysqlPacket.raise_for_error(self)
   219 if DEBUG:
   220     print("errno =", errno)
--> 221 err.raise_mysql_exception(self._data)

File /opt/conda/lib/python3.9/site-packages/pymysql/err.py:143, in raise_mysql_exception(data)
   141 if errorclass is None:
   142     errorclass = InternalError if errno < 1000 else OperationalError
--> 143 raise errorclass(errno, errval)

OperationalError: (3730, "Cannot drop table 'm1' referenced by a foreign key constraint 'm__part_ibfk_2' on table 'm__part'.")

Expected Behavior

Notice that datajoint was trying to drop M1 before dropping M.Part. The expected behavior should be dropping M.Part, M, and then M1.

Additional Research and Context

I dig into the code a little bit and realized datajoint would first collect all descendants of the table to drop, sort them by topological order, reorder the sorted list to unite master and part tables (unite_master_parts, which should not disrupt the topological sort), and then drop the tables in topological order.
However, unite_master_parts does not actually preserve topological order.
In the above example, the topologically sorted list of tables happens to be:

['`zhuokun_test`.`a`',
 '`zhuokun_test`.`m`',
 '`zhuokun_test`.`m1`',
 '`zhuokun_test`.`m_part`']

and unite_master_parts would reorder the list to be:

['`zhuokun_test`.`a`',
 '`zhuokun_test`.`m`',
 '`zhuokun_test`.`m_part`'
  '`zhuokun_test`.`m1`',]

which is not topologically sorted and would cause issues when dropping tables.
This issue is probably also related to the behavior mentioned in #927 (comment)

@dimitri-yatsenko
Copy link
Member

Great catch!

@dimitri-yatsenko
Copy link
Member

dimitri-yatsenko commented Oct 7, 2022

Thank you for reproducing the problem so well. Yes, the current unite_master_part is based on a faulty assumption that failed in this case. The algorithm meant to bring parts closer to their masters for transaction processing without breaking topological sorting. Fixing...

dimitri-yatsenko added a commit to dimitri-yatsenko/datajoint-python that referenced this issue Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants