Tuesday, 16 August 2022

CReW Expect Records, or "What to expect when you're expecting data"

 This morning I awoke to a WhatsApp message from my friend, @grossal (Alex) asking the following:

[5:12 AM, 8/16/2022] Alex Gross: Is there a CReW "Expect Records" Macros? 

I see a lot of time Use Cases, e.g. Database Connections or files where I need to check if records come in. I think this could also be useful for the SHA256 Macro.

I just had the scenario where I connected to a database and got zero records back - but it didn't crash my workflow because the metadata was still provided by the db.

Alex had already explored the macro updates to either create a cloned copy of the existing macro or to include functionality to choose between expect 0 or expect something.  It would be easy.  

Easy is in the eye of the beholder.  I knew that I'd have to change the image and that change would complicate everything.  I also don't want to interfere with any existing use of the macro.  I opted for another macro.  Of course, I'd opt for Alteryx to include an ERROR condition on tools that they should allow for an ERROR if 0 records are passed through them.


Another vertical workflow!  Instead of using a COUNT RECORDS macro, I opted for using the JOIN tool (configured with join by record position).  When 0 records are present in the I (Input) anchor, then the "Error Message" data is passed to the MESSAGE tool.  That tool is configured as an "ERROR" and passes the contents of the TEXT INPUT tool to the Results panel.

This method is ever so slightly more performant.  But more importantly it doesn't include another macro inside of it.  That simplicity appealed to me.  When I can use fewer tools and less macros (oxymoron) I think that I've got a winner.  After testing with and without AMP and testing just to see that it works I am confident enough to let you, my friends check it out too.

You can find the macro in the Alteryx Community (Gallery).  Right now I'm getting ready for Alteryx & Beer with Alex and the German Alteryx Community.

Cheers,


Mark

P.S. During the Alteryx & Beer session, I was coerced into creating an output anchor for the macro.  If data is present (e.g. NOT an error), the data will pass through the macro.  This was an ask and so it was made to happen.

P.P.S  If you have a LARGE set of data, the "Count Records" method will literally count every record.  This method only requires 1 record to know if things are kosher or not.



Wednesday, 3 August 2022

CReW SHA256 Macro: Enter the Python

 Did CReW just freeze over?

CReW SHA256 Macro

This wasn't the first rodeo for me, but it is the first time that I've publicly created a macro that uses the python tool in Alteryx.  I pride myself in doing things with BASE-A (100% Alteryx) whenever possible.  That happens virtually 100% of the time.  But until Alteryx adds SHA256 (or other encryption) to their products (note:  a FIPS version of Alteryx is available help.alteryx.com), this macro will get SHA256 encryption on your current (2022.2) version.

Where is the macro?

This link is at the top of the article for those who don't care about the making of the macro....


How does it work?

Select a single data field (String Type) and it will be SHA256 encrypted as:  [SHA256_Output].  There are warnings if EMPTY data is present or if the incoming data is NOT lowercase (the macro will automagically lowercase the input field).  If you don't want these features turned-on, you can disable them in the macro configuration.

The entire record is read and output by the macro.  Only the SHA256Input field is passed into a python tool which uses HASHLIB and encrypts your field.  If the incoming data is Empty(), it is bypassed.  If you have lots of duplicate data, you might want to UNIQUE that data and only pass the unique values through the tool.  It runs about 2,000,000 records per minute on my computer (with AMP on).  Without AMP, it runs about 50% slower for this function.  

Memory and AMP and Python can lead to shortages of memory.  Use caution.  If you experience memory-related issues, you might want to keep things simple.  I ran 200 million records through this process in less than 1.5 hours while testing.

What's inside the box?

Input:

Your incoming data is routed through a common RecordID tool and the entire record circumvents the encryption process and awaits the results in a JOIN tool (append to end of record).

Process-Prepare for Encryption

A formula tool assigns the selected (Interface Drop-Down) string field into the __SHA256Input__ field.  Conversion warnings (MESSAGE) are issued if the input field is Empty().  Conversion warnings are also issued if the input field contains uppercase letters.  The input field is converted to lowercase() for all records.  If these function offend you, you can deselect the execution of the preparation functions in the configuration (via CheckBox that updates the DETOUR).

Python - SHA256 Encryption

Empty data is filtered and then unioned to the results of the encryption.  Data then goes into the Python tool and the following script is executed (please note that the script is saved as a comment inside of a disabled container):

#################################
from ayx import Package
from ayx import Alteryx
import hashlib
#################################
df = Alteryx.read('#1')
#################################
def text_to_sha256(text):
    h=hashlib.new('sha256')
    h.update(text.encode('ascii'))
    return h.hexdigest()
#################################
df['SHA256_Output'] = df.apply(lambda row: text_to_sha256(row['__SHA256Input__']),axis=1)
#################################
Alteryx.write(df,1)

The python tool reads #1 input and outputs via #1.  The imported library, hashlib, is used to created a function that converts text to sha256.  A dataframe for output uses a lambda function to parse the data from __SHA256Input__ and creates your SHA256_Output field.

In an effort to minimize the python use of memory (as tested python uses 50% more memory than Alteryx was consuming), only the RecordID and SHA256Input are passed through this tool.  The tool is configured in "Production Mode".  A copy of the script is saved as a comment as friends of mine tell me is a good practice (sometimes it is lost otherwise).

Error Checking:

What could go wrong?  I can't imagine records being lost in the join, but should it happen there are two (2) CReW Expect 0 macros waiting to throw error messages if any unjoined data tries to escape the macro.


In Closing:

It has been a while since I've posted.  The community gallery is where I've been stashing macros.  Every once in a while I hear questions about the use of CReW while using AMP.  On my to-do list is to look through the parallel-BUD tool and modify it to be AMP compatible.  In reading community posts it seems that it might not work without compatibility mode enabled.  If you have any CReW or AMP feedback/concerns (related to CReW), please let us know.

A special note of thanks to Alex Gross (AYX Community @grossal), ACE and friend who helped to get through some python woes.

Cheers,

Mark