ETL - BI - Users (Ops Package)
Last Updated 5/26/2021
Overview
Location
Schedule
Details
Packages
master
user-stage-dw
tableau-sync
Python Modules
ad_load_dw.py
tableau_load_dw.py
tableau_update.py
tableau_helper.py
ad_helper.py
dw_helper.py
Change Log
Troubleshooting Notes
Overview:
The project that stages BI users is the data warehouse's link to Azure AD and uses Microsoft Graph to stage all members of "BI-" AD groups into staging tables in the warehouse and also to sync that membership to Tableau Online.
The ETL has two main components. A python package that pulls down data from Azure AD (Microsoft Graph) and from Tableau, and then a python package that syncs membership lists in Tableau.
The primary purpose of this job is to generate user lists for user dashboard and to sync membership with Tableau. Uses include:
- Populate staging table schemas with azure ad and tableau membership data and sync back to Tableau
- dataMarq procs to build usership table
- sp_d_bi_users
Build d_bi_users table that stores membership in various BI groups
- sp_d_bi_users
Schemas in staging that are populated with this data include:
- Base schema: ad (for azure ad data), tab (for tableau data)
- CDC schema: N/A
Location:
The project – ETL-BI-Users– contains all the packages and python code and resides in the BI-Ops folder in the SSIS catalog.
Schedule:
The jobs in this project run as part of a single SQL Agent jobs (full dataMarq schedule).
ETL – BI - Users – Everyday at 12:30pm and 8:30pm
Project and Package Details:
The packages in ETL – BI - Users work with the both the Microsoft Graph API and the Tableau API and use python modules to interact with these web services.
Project Parameters
dbEnvr: INT, STG, PRD – which DB should be loaded with the data. Passed to python program as –dbEvnr argument
dwConnStr: connection to datamarq
dwStgConnStr: connection to datamarq_staging
gitEnvr: DEV or PROD. Because these jobs use python packages they python code is stored in our git folder on the BI shared drive. For testing, we are able to point at the python code in DEV or in PROD.
groupFilter: This is passed to the Microsoft Graph API to only pull membership information for groups that match this filter. Generall we pass "BI-" to get all BI groups but can do specific groups for testing.
logLevel: DEBUG, INFO, WARNING, ERROR, CRITICAL – This is passed to the python scripts as an argument to set the log level for the jobs
pythonPath: The path to the python executable – saved in the D:/ drive on servers but usually in C:/ if executing locally/testing.
reportsOnly: True or False, passed to the AD stage job to only stage reports for the other jobs that use these same python codes.
Packages
MASTER
Overview
This primary orchestration package, it executes other "worker" packages in the project.
Package Parameters
None
Package Variables
None
Package Flow
- Execute the user_stage_dw package
- Execute the tableau_sync package
- Execute the user table stored procedure
bi.sp_d_bi_users
USER-STAGE-DW
Overview
This package has two tasks that fire of the python scripts that stage the data from Azure AD and Tableau.
Package Parameters
None
Package Variables
None
Package Flow
- Fire the python module ad_load_dw.py that stages all the BI groups and members into the staging database. Note the use of the project parameters and the need to encapsulate every command in quotation marks
"\"//marqnet.mu.edu/depts/ITS/BI/git/" + @[$Project::gitEnvr] + "/DW/ETL-BI/ad-load-sync/ad_load_dw.py\" \"-dbEnvr\" \"" + @[$Project::dbEnvr] + "\" \"-groupFilter\" \"" + @[$Project::groupFilter] + "\" \"-log\" \"" + @[$Project::logLevel] + "\"" - Fire the python module tableau_load_dw.py that stages all Tableau groups and members into the staging database. Note the use of the project parameters and the need to encapsulate every command in quotation marks
"\"\\\\marqnet.mu.edu\\depts\\ITS\\BI\\git\\" + @[$Project::gitEnvr] + "\\DW\\ETL-BI\\ad-load-sync\\tableau_load_dw.py\" \"-dbEnvr\" \"" + @[$Project::dbEnvr] + "\" \"-reportsOnly\" \"" + @[$Project::reportsOnly] + "\" \"-log\" \"" + @[$Project::logLevel] + "\""
TABLEAU-SYNC
Overview
This package has two tasks, it execute the python module that syncs BI group users and members to Tableau, and also re-stages the Tableau membership detail after the sync is done so the staging tables are aligned. Note that it is membership in the BI-Tableau AD group that gets a person added to Tableau as a view. Also, to be made a group in Tableau, a group has to be listed on the staging table tab.tab_auth_groups. Only groups that are listed on this table will be made groups in Tableau Online.
Package Parameters
None
Package Variables
None
Package Flow
- Fire the python module tableau_update.py that syncs Azure AD group membership and users into Tableau Online. Note the use of the project parameters and the need to encapsulate every command in quotation marks
"\"\\\\marqnet.mu.edu\\depts\\ITS\\BI\\git\\" + @[$Project::gitEnvr] + "\\DW\\ETL-BI\\ad-load-sync\\tableau_update.py\" \"--dbEnvr\" \"" + @[$Project::dbEnvr] + "\" \"--log\" \"" + @[$Project::logLevel] + "\"" - Fire the python module tableau_load_dw.py that stages all Tableau groups and members into the staging database. This realigns what is in the staging database after the sync. Note the use of the project parameters and the need to encapsulate every command in quotation marks
"\"\\\\marqnet.mu.edu\\depts\\ITS\\BI\\git\\" + @[$Project::gitEnvr] + "\\DW\\ETL-BI\\ad-load-sync\\tableau_load_dw.py\" \"--dbEnvr\" \"" + @[$Project::dbEnvr] + "\" \"--reportsOnly\" \"" + @[$Project::reportsOnly] + "\" \"--log\" \"" + @[$Project::logLevel] + "\""
Python Modules
Qualtrics data is accessed through a web service which dataMarq achieves through the use of python modules. These all reside in the ETL-Qualtrics solution – and therefore in the git repository – alongside the SSIS packages, which call them. They also require a config file that is stored in the BI shared drive, config folder.
Qualtrics API resource: https://api.qualtrics.com/api-reference/
ad_load_dw.py
This program loads the BI group data (users, groups and group members) into the ad staging schema in the data warehouse.
Parameters
dbEnvr – INT, STG, PRD
groupFilter – This is passed to the Microsoft Graph API to only pull membership information for groups that match this filter. Generall we pass "BI-" to get all BI groups but can do specific groups for testing.
logLevel – the level to log to the log folder for the job (WARNING, INFO, ERROR, DEBUG)
Quasi code flow
- Set the config file path based on the dbEnvr. Configs are stored in the BI shared drive folder "config".
- Get the refresh token to authenticate to Microsoft graph. This calls the get_token function in the ad_helper.py module to refresh the token on each run.The tokens are stored in the BI shared drive folder "tokens"
- Get the groups list using the token and web endpoint by calling the get_group_dict fuction in the ad_helper.py module. This calls the webservice and gets every group that matches the given groupFilter.
- Get the membership lists from Azure AD by calling the get_update_list function in the ad_helper.py module to get a list of members for each BI group.
- Connect to the staging database in the data warehouse by calling db_connect in the dw_helper.py module
- Get the tables, fields and values to be inserted into the ad staging tables. The tables and fields are stored in the config file, the values come from the lists generate in the step above.
- Truncate tables and load all values just pulled down from Azure AD into the staging tables for group, users and group members. This is done by calling functions from the dw_helper.py module, including trunc_table, insert_rows and various "update" functions.
tableau_load_dw
This program loads the user and group data (users, groups and group members) from Tableau into the tab staging schema in the data warehouse.
Parameters
dbEnvr – INT, STG, PRD
reportsOnly – True or False, passed to the API to only load reportst for the reports only load
logLevel – the level to log to the log folder for the job
Quasi code flow
- Set the config file path based on the dbEnvr
- Create a connection to Tableau Online using the tableau_connect function in the tableau_helper.py module
- Populate lists of views and permission using the functions in the tableau_helper.py module (get_workbook_views, get_workbook_permissions)
- If the reportsOnly variable is "False", populate userList and groupList with lists from Tableau API
- Create a list of user details for insertion into the staging table
- Create a list of groups and member details for insertion into the staging table
- Connect to the staging database in the data warehouse by calling db_connect in the dw_helper.py module
- Get the tables, fields and values to be inserted into the tableau staging tables. The tables and fields are stored in the config file, the values come from the lists generate in the step above.
- Truncate tables and load all values just pulled down from Tableau into the staging tables for group, users and group members. This is done by calling functions from the dw_helper.py module, including trunc_table, insert_rows and various "update" functions.
tableau_ad_sync.py (decomissioned)
No longer in use.
tableau_update.py
This program sync data from the ad staging tables for users and groups members to Tableau Online. It only syncs groups that are identified for upload to Tableau and group members who have been identified as Tableau viewers.
Parameters
dbEnvr– INT, STG, PRD
logLevel – the level to log to the log folder for the job
Quasi code flow
- Set the config file path based on the dbEnvr
- Create a connection to Tableau Online using the tableau_connect function in the tableau_helper.py module
- Populate a list of groups and users that need changing in Tableau Online by using the get_group_compare and get_user_compare functions in the dw_helper.py module
- Add and delete groups, users and group membership using the update_tableau_groups and update_tableau_users function from the tableau_helper.py module
ad_helper.py
All of the functions that are used to interact with the Azure Microsoft Graph APIs. These are mostly derived from the python module "adal" which is open source and created for interacting with Azure AD. Details are here: https://adal-python.readthedocs.io/en/latest/aa
api_endpoint
Parameters:
url - the url to turn into an approprirate endpoint
Function: Converts a relative path into a full uri based on the adal endpoint
Output: server connection object
get_token
Parameters:
None
Function: Takes a credential - stored in the config file - and acquires an authentication token to interact with Azure AD
Output: token, tokenType
get_response
Parameters:
url - The url endpoint for a webservice call
headers - The headers to pass to the request to the web service
Function: A generic function that calls the requests module to get a response from a web service. The response is transformed into JSON for the output variable
Output: nextPage - if the response has a next page url, JSON - the response in JSON format
get_group_dict
Parameters:
token - The token for Azure AD access
tokenType - The token type that is passed (refresh, access, etc.)
params - Any specific parameters to tag onto the request, many Azure AD request take specific parameters
Function: Takes a token to Azure AD and calls the groups request end point to get a list of all the groups that match a certain param. For this process we generally pass through the filter for group that start with "BI-"
Output: groupDict - a dictionary of groups from Azure AD
get_member_dict
Parameters:
groupId - The Azure AD group id for a specific group
token - The token for Azure AD access
tokenType - The token type that is passed (refresh, access, etc.)
params - Any specific parameters to tag onto the request, many Azure AD request take specific parameters
Function: Takes a token to Azure AD and calls the groups members request end point to get a list of all the groups members for a given group. These are then stored in a dictionary with specific user details
Output: memberDict - a dictionary of group members, with upn as the key and various other pieces of info on the user like displayName, department, jobTitle, etc.
get_update_lists
Parameters:
token - The token for Azure AD access
tokenType - The token type that is passed (refresh, access, etc.)
groupDict - dictionary object of AD groups
Function: Takes a token to Azure AD and a dictionary of groups and calls the get_member_dict function (see above) for each group, creating lists of groups, group members and individual users in separate lists
Output: groupList, memberList, userList
tableau_helper.py
All of the functions that are used to interact with the tableau APIs. These are mostly derived from the Tableau python package created by Tableau and offered for free. Details are here: https://tableau.github.io/server-client-python/docs/
tableau_connect
Parameters:
None
Function: Uses data from the config file to connect to the Tableau Online instance
Output: server connection object
get_tableau_user_list
Parameters:
server – The connected Tableau server object
Function: Takes the server connection object and returns a list of all users
Output: userList - list of all users on the Tableau Online instance
get_tableau_group
Parameters:
server – The connected Tableau server object
Function: Takes the server connection object and returns a list of all groups
Output: groupList - list of all groups on the Tableau Online instance
get_tableau_group_members
Parameters:
server – The connected Tableau server object
groupList - The list of groups on Tableau
Function: Takes the server connection and a list of groups and loops through each group populating a list of members of that group.
Output: groupDetails - the details for each group, memberDetails - the details for each member of the group
get_user_item
Parameters:
server – The connected Tableau server object
upn - The email of the user
Function: Takes a server connection and the upn of a user and gets the details for that user
Output: userItem - the user Object from Tableau
get_group_item
Parameters:
server – The connected Tableau server object
group_name - The name of the group
Function: Gets the group object for the group passed to the server object
Output: groupItem - the group Object from Tableau
get_tableau_workbook_list
Parameters:
server – The connected Tableau server object
Function: Calls Tableau API to get list of all workbooks
Output: workbookList - a list of all workbooks on the Tableau instance
get_workbook_permissions
Parameters:
server – The connected Tableau server object
wbObject – the workbook object from Tableau
Function: Takes a workbook object and returns a list of users with their permissions
Output: permissionList - a list with the detail permissions for the workbook
get_workbook_views
Parameters:
server – The connected Tableau server object
wbObject – the workbook object from Tableau
Function: Takes a workbooks object and returns a list of all the views in the workbook
Output: viewList - a list of views with some details
update_tableau_groups
Parameters:
server – The connected Tableau server object
groupCompareList – a list of groups generated from the data warehouse function in the dw_helper.py module
Function: This function takes a group compare list and uses it to add or delete groups within Tableau Online so they are synced with the information in the data warehouse
Output: Number added, Number deleted
update_tableau_users
Parameters:
server – The connected Tableau server object
userCompareList – a list of users generated from the data warehouse function in the dw_helper.py module
Function: This function takes a user compare list and uses it to add or delete users within Tableau Online so they are synced with the information in the data warehouse
Output: Number added, Number deleted, Number with role changes
update_tableau_group_membership
Parameters:
server – The connected Tableau server object
group – the group Object for Tableau Online
memberList - a list of members taken from the data warehouse that should be in the given group
Function: This function takes a group object and a member list and makes sure the members in Tableau Online align with what is in the data warehouse.
Output: Number added, Number deleted
dw_helper.py
All of the functions that are used to interact with the datamarq database
insert_rows
Parameters:
connection – the python connection object to datamarq
table – the table name for insert
fieldList – the list of fields to be inserted
valueLIst – the list of values to be inserted
Function: Generic function that inserts values into a given table and fields
Output: s – number of successful rows inserted, f – number of failed rows caught in exception
get_user_compare
Parameters:
connection – the python connection object to datamarq
Function: Executes a query in datamarq to get the userCompareList. The query is specified within the function. It compare what is on the ad staging tables with what is on the Tableau staging tables
Output: userCompareList – a list of users
get_group_compare
Parameters:
connection – the python connection object to datamarq
Function: Executes a query in datamarq to get the groupCompareList. The query is specified within the function. It compare what is on the ad staging tables with what is on the Tableau staging tables
Output: groupCompareList – a list of groups
get_member_compare
Parameters:
connection – the python connection object to datamarq
Function: Executes a query in datamarq to get the memberCompareList. The query is specified within the function. It compare what is on the ad staging tables with what is on the Tableau staging tables
Output: memberCompareList – a list of group members
get_user_membership
Parameters:
connection – the python connection object to datamarq
upn - a users unique upn (marquette email)
Function: Take a user's upn and returns the ad groups they are members of (with tab id if also a member in tableau) based on what is in the staging tables
Output: groups – a list of groups a given user is a member in
update_hash
Parameters:
connection – the python connection object to datamarq
table – the table name
fieldList–List of fields in the table
Function: takes a field list and builds a hash statement to get a unique row hash then runs an update statement
Output: None
cursor_list
Parameters:
cursor – a python sql cursor object (select statement usually)
name – the name of a tuple for each output row of the cursor
fields–list of fields in the output row
Function: Creates a result set (python list) from a cursor execution as a each row a named tupled with defined fields
Output: resultSet – python list of named tuples as each row
update_audit
Parameters:
connection – the python connection object to datamarq
table – the table name
stageSource – The string to be put into the stage_source field in the staging table
Function: Runs commands to update the audit fields in a staging table with the default getdate() for the modified and created date
Output: None
update_dt
Parameters:
connection – the python connection object to datamarq
table – the table name
dt_field – The date to be entered into a datefield
Function: Runs commands to update the given date field to the given date
Output: None
alter_index
Parameters:
connection – the python connection object to datamarq
table – the table name
method – The method to be run on the index (disable, rebuild)
Function: Runs commands to alter all indexes on the table for the given method
Output: None
trunc_table
Parameters:
connection – the python connection object to datamarq
table – the table name
Function: Runs command to truncate the given table
Output: None
Change Log:
Date | Developer | Change Record |
5/24/2021 | N/A | Initial Documentation |
Troubleshooting Notes:
-
Timing of Adding Members to Tableau
Group members need to be added first in the AD group. Then the AD group needs to be synced to Azure AD before it can be pulled back down and the membership/user and by synced to Tableau. Sometime the timing of this is off when the ETL - BI - Users job is run and there is delay adding someone. Usually they are up in Azure AD within 30 minutes and the job can be run without a problem manually. But ocassionally timing will be a problem -
Member Compare Query Issues
This should be resolved but there have been issues fixed with the member compare query. It is a bit of a complicated query because it has to check for membership in a group and also membership in the BI-Tableau group. Membership in the BI-Tableau group is what gets people ultimately into Tableau. If someone isn't being added for some reason, check this query to make sure things show up correclty from the staging tables. Members to be added shoudl be missing the tab data from the query and members to be deleted should be missing the ad data. Members with both sets of data shouldn't be changed.