This notebook demonstrates how to convert the generally unhelpful MicroStrategy schema into human-readable JSON. We can then use a diagramming library such as JointJS or GoJS to draw aesthetically pleasing entity relationship diagrams.
We will use the schema from the MicroStrategy Tutorial. The schema can be obtained in Developer from 'Schema' > 'Export Project Schema'. Export the Table Catalog - Logical View which contains a list of all warehouse tables and their attributes and facts.
from pyexcel_xls import get_data
import json
data = get_data("schema.xls")
logical_view = data["Logical View"]
temp = [[item[0], item[1], item[2]] for item in logical_view]
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
import pandas as pd
df = pd.DataFrame(temp[1:], columns = logical_view[0])
df.head(10)
Table Name | Object | Type | |
---|---|---|---|
0 | CITY_CTR_SLS | Call Center | Attribute (K) |
1 | CITY_CTR_SLS | Customer City | Attribute (K) |
2 | CITY_CTR_SLS | Cost | Fact |
3 | CITY_CTR_SLS | Profit | Fact |
4 | CITY_CTR_SLS | Revenue | Fact |
5 | CITY_CTR_SLS | Units Sold | Fact |
6 | CITY_CTR_SLS | Gross Revenue | Fact |
7 | CITY_MNTH_SLS | Month | Attribute (K) |
8 | CITY_MNTH_SLS | Customer City | Attribute (K) |
9 | CITY_MNTH_SLS | Cost | Fact |
# form node list
graph_spec = {"nodes": [], "links": []}
df2 = df["Table Name"].drop_duplicates()
for table_id in df2:
props_list = []
temp = pd.DataFrame()
temp = df[df["Table Name"] == table_id]
for index, row in temp.iterrows():
props_list.append(row["Object"])
graph_spec["nodes"].append({"id": table_id, "props": props_list})
# cartesian product to find possible edges
df3 = pd.merge(df, df, how='left', on='Object')
# drop rows if Type eq Fact because can't join on Facts
df3 = df3[df3.Type_x != "Fact"]
# drop rows if same Table Name because it is a duplicate record
df3 = df3[df3["Table Name_x"] != df3["Table Name_y"]]
df3.head(10)
Table Name_x | Object | Type_x | Table Name_y | Type_y | |
---|---|---|---|---|---|
1 | CITY_CTR_SLS | Call Center | Attribute (K) | DAY_CTR_SLS | Attribute (K) |
2 | CITY_CTR_SLS | Call Center | Attribute (K) | F_TUTORIAL_TARGETS | Attribute (K) |
3 | CITY_CTR_SLS | Call Center | Attribute (K) | ITEM_CCTR_MNTH_SLS | Attribute (K) |
4 | CITY_CTR_SLS | Call Center | Attribute (K) | LU_CALL_CTR | Attribute (K) |
5 | CITY_CTR_SLS | Call Center | Attribute (K) | LU_EMPLOYEE | Attribute |
6 | CITY_CTR_SLS | Call Center | Attribute (K) | SUBCATEG_MNTH_CTR_SLS | Attribute (K) |
8 | CITY_CTR_SLS | Customer City | Attribute (K) | CITY_MNTH_SLS | Attribute (K) |
9 | CITY_CTR_SLS | Customer City | Attribute (K) | CITY_SUBCATEG_SLS | Attribute (K) |
10 | CITY_CTR_SLS | Customer City | Attribute (K) | LU_CUST_CITY | Attribute (K) |
11 | CITY_CTR_SLS | Customer City | Attribute (K) | LU_CUSTOMER | Attribute |
# only keep unique pairs of nodes
unique_pairs = set()
temp = []
for index, row in df3.iterrows():
if ((row[0], row[3]) in unique_pairs):
pass
else:
unique_pairs.add((row[0], row[3]))
temp.append(row)
df4 = pd.DataFrame(temp)
for index, row in df4.iterrows():
graph_spec["links"].append({"source": row[0], "target": row[3]})
import json
with open('data.json', 'w') as f:
json.dump(graph_spec, f)