Friday, 9 October 2015

Crew Macro Pack - 2015 Q3 Release

It it with great pleasure that I can announce the next release of the CReW macro pack.

It is available for download now at


Or for anyone still on 9.5


It has been a long time coming I know, but I hope you enjoy the new content.  I'm pleased to say this pack has three new contributors from the community, something which I hope we can continue with future releases.

Installation Instructions


  1. Download the zip file above
  2. Extract it to a location on your machine where you want to keep the macros.  (Not in the Program Files/Alteryx folder or an existing Macros folder)
  3. Run Install.yxwz

Community Supported


These macros are not an official Alteryx release and as such they have not been through the full testing process that tools in the product have.  Some of them are prototypes and experimental, but that is some of what makes them so fun.  

If you find an issue let me know either in the comments below or drop me personally a note and I'll take a look, or if you can see how to fix it yourself then even better!

What's New?


New Macros

Wildcard XLSX Input - This macro was contributed by Joe Mako and came from a discussion on this community post.  Ever wanted to read in multiple XLXS files and worksheets?  Then this is the macro for you!  (This is the only macro in the pack which is currently missing from the 9.5 version as it makes use of 10.0 features)

Description - "Reads in multiple sheets from multiple Excel files from a directory. Optionally can filter out sheets, can pull field names from a header row, can auto configure field types, can sample N random sheets, and if the headers do not match then raw data for the sheets that did not match will be in the 'N' output."


Chi Square Test - Our second new macro author this month, this one is from Mark Frisch or MarqueeCrew another very active member of the new community site.

Description - "Have you created a sample set and wanted to know if there was a bias in the sample?  Random doesn't always mean you get an even quantity of data across your variables.  This test will check to see if the distribution of domain values (for a specific variable) are representative of the parent population.

Best to double check your samples before you run your tests!"


Moving Summarize - This is a macro I wrote almost a year ago now and featured on this blog here.  But it really has been that long since the last macro pack release!

Description - "Calculate moving averages or any type of moving summarize"





Trim Mean - This macro was written by me and was inspired from the recent Alteryx roadshow in London, with a user requesting this functionality which is available in Excel.

Description - "Returns the mean of the interior of a data set. Trim Mean calculates the mean taken by excluding a percentage of data points from the top and bottom tails of a data set. You can use this macro when you wish to exclude outlying data from your analysis."

Bug Fixes


Missing Tool quick descriptions - A big thanks to David Zurbuchen for both spotting these were missing, but better still taking the time and effort to fix them.  A great example of how you can contribute to the macro pack with small but important fixes.

Runner Macro fixes - There were a number of issues with the runner macros in the first release of them.  Many thanks to all of the people who both reported them and spent time with me helping to work out exactly what it was that caused them.  There is one outstanding issue that I know of which involves uni-code characters in the module path.

Add Totals Macro - Now works when there are spaces in the field names.

Get Involved


I want to thank again the contributions from Joe, Mark and David.  You guys have given me the motivation to get this going again.

And on that note I want to encourage anyone else who is interested in getting involved to get in touch either in the comments here or via email.  I really want to make this a community effort and try and make the releases a bit more regular.  There is plenty to do on the existing macros, samples, help files and I also want to try and make some more progress on my Test Suite macros.

Till next time.

Thursday, 23 April 2015

Testing Alteryx Macros - Part 4 - Testing For Errors

At the end of my last post we decided the next test to write was

"If a user tries to total a string column they get a meaningful error message."

This is going to require some new functionality to the test framework; as for this test to pass we will need the Alteryx module for it to fail (and not just fail, but fail with a very specific message).  I am going to start with how a user of the test framework (i.e. a macro developer) would do this and then we'll take a quick look at how the test framework works for this feature.

Write a Failing Test

For this test we will use a new macro in the test framework the Expect Error macro:


And we write a test which looks like this:

  • The Text Input has a single Text field called field 1 containing "Some Text"
  • The Expect Error macro has a single configuration option "Expected Error Message" which we set to the error message we expect to get in this scenario "Not possible to total Field1 as it is a non-numeric type: String"
Now if we run our test runner module we get the following output


So we get a message telling us what error message we were expecting to get from which tool(s) and then messages telling us what error messages we actually got (in this case the standard formula message about trying to use a string when a number was expected).

We can make one slight improvement to this test, in that we know exactly which tool we are expecting the error message to come from, so we should specify that in the test.  We do that by enclosing the Expect Error tool in a tool container with which ever tools we expect the error to come from:

Giving us the following log from the test runner:


Make The Test Pass

Having a failing test we can now add functionality to the macro to make it pass.  Which is a fairly straightforward process:


  • We already had a Field Info tool which gave us the types of all of our fields.
  • New filter tool filters off any fields of type String.
  • Then a message tool throws the correct error message to the macro user configured like so:

Run the Test Runner again and:


Success! We have written our first unit test which tests for an error.

So How Does It Work?

I said at the start that after we had used the new functionality we would take a look at how it worked.  This detail is not necessary to be able to successfully use the test framework, but might be interesting to see what's going on behind the scenes.

The Expect Error Macro

The contents of the Expect Error Macro are probably the most boring Alteryx macro you will ever see.  It looks like this:
That's right apart from the question it is empty!  All of the "magic" happens in the Test Runner macro.

The Test Runner Macro

This is looking a bit more complicated than last time we saw it and actually probably could do with being re-factored into a few sub macros:


  1. The macro here looks inside each of the Test Modules and parses out the Tool XML.  It is looking for Test Modules which contain the Expect Error macro.
  2. Modules which don't contain the Expect Error macro go this top route and get run as we saw previously.  If the module errors it fails the test and reports to the output log.
  3. Modules which do contain the Expect Error macro are also run with the Runner macros but this route the test fails if the test module runs without errors.
  4. This section again parses out the module XML and works out what error messages and from what tools the Expect Error macro is expecting.
  5. This join tool matches the expected error messages against the error messages which came out of the log.
  6. The final section of tools deals with reporting any mismatches between the expected and actual error messages and reporting the success or failure of the test.

Wednesday, 18 March 2015

Testing Alteryx Macros - Part 3 - Comparing Data Values

OK time to write some more tests.  But first we need to add some more functionality to our "Expect Equal" test macro.  If you recall from out last post, currently all it does is check if the column names match between the expected and actual data.  Our next series of tests are going to consider the data in those tables which is going to need some new functionality to tell us if the data differs.

It looks like this:

The old functionality is still there above these tools, but I omitted it from the screenshot for clarity.

So what is this doing?  Well the Record ID tools are adding a row number onto both datasets and then the transpose tool uses this as a key and all other fields (importantly including unknowns) as data fields.

The Join then joins the two streams together on the row number and field name.

Finally the message tool is configured:


Whenever the data differs we get a nice error message:

Message (30) Unexpected Value - Field:Total Row:1 Expected:1 Actual:0

With this in place we can write our next tests.  So just a quick reminder of the TDD work cycle:
  1. Write a test for a piece of functionality which will fail.
  2. Write some code which makes the test pass.
  3. (Optionally) Re-factor your code.  Without adding any new functionality and without breaking any tests.
and we are off.

Write a Failing Test

The next smallest piece of functionality I could think of is this:

"Single Field Named Field1 With Value 1 Gives Total of 1"

I add a new module to my test folder called Single Field Named Field1 With Value 1 Gives Total of 1.yxmd and copy the module from my first test


All I change is the incoming test data to

and the Expected data to

Then I save the module and run my test runner which gives me this log


Great a failing test with a clear and precise error message.

Make The Test Pass

So next the smallest amount of code to make the test pass.  Easy open up our Add Totals macro


And just change the hard coded formula value from 0 to 1.  Run the tests again



Success!

Write a Failing Test

Well I can quickly see where my current implementation is going to fall down.  What if my user's data contains something other than the value 1?  So let us write a test to exercise that:

"Single Field Named Field1 With Values 1,2,3 Gives Totals 1,2,3"

So again copy and paste one of our existing tests and update the data

Incoming test data

Expected data
And run the tests

Fails as expected and again a pleasing and exact error message.

Make The Test Pass

Again easy enough to make this test pass.  We just update our formula to [Field1]

and


Oh dear.  Our new test is passing, but our first test "Adds A Column Called Total" has failed.  What's happened?

So the error message is telling us that our new formula is giving a string result.  If we recall our first test our fields didn't have any data in them at all.  This means that the text input tool has no data to use to determine the type of Field1 so defaults to a type of V_WString.  So yes trying to use that in a formula which should result in a double is an error.

But that's not a very good error for users of our macro.  It refers to internal tools in our macro and doesn't tell the user what to do to fix it.  We can do better than that, but not right now.  Right now we are working on making our tests pass having added a new one.  We shouldn't be adding any new functionality.

In TDD it is often helpful to have a small list somewhere (I tend to scribble on a pad on my desk) of tests that you are going to add in the near future, then when you find things like this you can simply add them to the list and get back to the actual problem you are working on.  So we'll do just that

Tests To Add
- If a user tries to total a string column they get a meaningful error message.

This is some of the strength in the TDD approach you are exploring ways in which your macro will be used and can break as you develop it.

So back to the problem at hand.  Our first test was never meant to test anything about wrong field types, it was just supposed to check the Total column got added.  So let's just fix up the test with a couple of select tools to force our columns to be numeric types

and


All good!

Write a Failing Test

"Two columns Field1, Field2 With Values 1,2,3 Gives Totals 2,4,6"

Again the test looks identical to our previous ones just with updated Test and Expected data.



Make The Test Pass

Now we're beginning to get to some real implementation.  The naive fix of updating our formula to [Field1] + [Field2] isn't going to cut it as it would fail on the tests which don't have Field2.

I need to dynamically get the names of my fields, use them to build up my formula and then apply that formula to my data. 

Which implemented looks like this

Quick overview from left to right:
  • The field info tool gets all of the field names
  • The summarize tool builds the total formula by concatenating the field names with a + separator
  • The formula tool sets up the Type, Size and Field Name that we need for the dynamic formula tool
  • The dynamic formula tool applies the formula to the data
Run the tests


and not only do we know we have implemented out latest piece of functionality, but we can also see that we have not broken any functionality that we previously added.

Next Time

Now we are beginning to get something that is beginning to look like it might be useful.  Looking at my list of future tests I think now is going to be a good point to tackle that better error message if my user tries to total a string field, which we will look at in my next post.

Wednesday, 11 March 2015

Testing Alteryx Macros - Part 2 - Our First Test and the Start of a Test Framework

At the end of my previous post I talked about the need for a test framework to start writing our tests.  I am going to build it out piece by piece as I need it.

Recall our first test was going to be

"The macro will add a new column called Total"

So what does a test look like for this?  Well it is an Alteryx module like this:

Let me talk you through it tool by tool.

The bottom left text input is our test input data.  It is configured like so


It is a single field called Field1 and has no data.  Having no data might seem a little strange, but if you look back to our test description, all we are testing is that a new column is added, at this point we don't care about the data just the columns.  Testing the data will be a future test.

The blue circle is the Add Totals tool we are building out, except that remember we haven't added any functionality yet, so right now all it does is pass the input data straight through and looks like this:

The top left text input is our expected data.  It is configured like this


So again we have no actual data, but this is setting the expectation of what we expect our macro to output in order for us to say it passes the test "The macro will add a new column called Total".

That is to say if we put a browse tool after the expected data text input and one after the macro we want the contents of those two browses to be identical.

Which brings us to our final tool and the first part of my test framework that I have built.  It is the "Expect Equal" macro.  The premise of it is fairly simple:  It takes two inputs: an Expected Data and an Actual data and if they differ will throw an error.  Hopefully the error will be detailed enough to quickly explain the difference.  

As I mentioned at the start of this post I am building up the functionality of the framework as I need it.  So right now all it compares is the field names on the two data streams, like so:

So we have a field info tool on each input. We join on name and then collect the mismatches.  If there are any we summarize these into a single record and use a message tool to issue an error message.

All of this together means that if we run our first test module (the one at the top of this page) we get the following output log:


Our first failing test!  With a useful error message telling us the field "Total" is missing from our actual data (of course because we haven't added anything to create it yet).  A good start to our TDD approach to macro building.  

Now before we go on to writing some code to make our test pass I want to look at a better way to run our tests.  My plan is that each test that we identify and build will be a separate Alteryx module and the name of that module will be the name and condensed description of the test.  That way if we break one of our tests we get fast feedback about what we broke.

My first module is called "Adds A Column Called Total.yxmd"  (I can use spaces in windows file names and it improves readability so why not?)

I will keep all of the tests together in a folder and what I want is a one button way to run them all at once.  And that brings us to the second part of our test framework the "Test Runner" macro


This macro will look in a particular folder (where we have saved all of our tests) and run each of the tests in turn and report a pass or fail for them.  So right now it produces this output log


From this log we can see straight away which test is failing and also the error message which is causing it to fail.

And how does the Test Runner macro work?  By using the CReW runner macros of course!

A quick run through of the tools from left to right:
  • The Directory tool picks up all of our test modules from the folder we have saved them in.
  • The List Runner macro runs each of those tests.
  • The first Message tool issues a passed info message for tests which do not have errors.
  • The second Message tool issues a failed error message for tests which have errors.
  • The Log Parser macro breaks the logs out into a record per row.
  • The final Message tool issues info messages for the errors that occurred in the failing test the second row that you can see in the log above.
  • The Browse is just there for testing.
It was a great feeling when I realized that the runner macros were exactly what I needed to solve this problem.

So all there is left for us to do is write some functionality to make our test pass and that functionality is as simple as adding a formula tool to our macro

The macro creates a new field called Total and hard codes it to zero.  "Wait a second" you might be saying, "that doesn't seem very useful".  And right now it isn't.  But a guiding principal of TDD is small, fully tested steps.  Write just enough test to make it fail and then juts enough code to make it pass.  And right now we have no tests about what data is in the field, just that the field is created.  So that formula tool is "just enough" code to make our test pass; and if we save the macro and re-run our test runner macro we get the following log


Success! We have completed our first TDD cycle.  Given we have just written the code there really shouldn't be anything to re-factor at this point so time to write our next failing test.

We will continue to build out this macro and the Test Framework in the next post.

Wednesday, 4 March 2015

Testing Alteryx Macros - Part 1 - A Question

I have been meaning to write a post about testing Alteryx macros for quite a long time now, but for some reason have never quite got started on it.  I hope that now is the time that I can finally get some of my thoughts down in writing.

And this series of posts is going to a start with a question.  A question whose answering I hope will take me to all the places that I want to talk about regarding testing Alteryx macros.  As I write this first post I don't know the answer to my question, but I also have a feeling that I will still achieve what I want from these posts regardless of what that answer may be.  After all "Life is a journey, not a destination".

So here is the question that I want to answer:

"Is it possible to use Test Driven Development (TDD) to develop an Alteryx macro?"

For anyone unfamiliar with TDD then I recommend a quick read of the Wikipedia article here which will fill you in on the basics.  But the work cycle looks like this:

  1. Write a test for a piece of functionality which will fail.
  2. Run the tests and see the test fail.
  3. Write some code which makes the test pass.
  4. Run the tests and check they all pass.
  5. (Optionally) Re-factor your code.  Without adding any new functionality and without breaking any tests.
  6. Repeat.


Working in this way means you add functionality in small and fully tested steps until you have built up a larger feature.  The nice thing with knowing that all the functionality is tested is that when you come to re-factor in step 5 you can do so confident in the knowledge that you haven't broken anything.

So how do we apply this to building an Alteryx macro?  Well let's have a go.  

For this example I am going to try and rebuild the CReW macro "Add Totals".  This is a fairly simple macro which allows the user to select a number of numeric columns and then adds a total row and total column on to the data table.  



This seems a good macro to start with as it has a reasonable amount of complexity (but not too much) and also has the benefit that I wrote it a long time ago and have no idea how it currently works.  I intend to build my new version without looking and then compare the two at the end.

OK so let's start.  Step 1 we need to identify a small piece of functionality and write a test for it which fails.  So I can imagine that the first piece of functionality we might identify is adding just a total column for a data stream which has a single field called Field1 on the incoming data stream.  Not very useful in itself, but is the smallest first step I can think to take and TDD emphasizes small steps.

So how do we write a test for that?  Ah... We have hit our first problem.  For other programming languages, where TDD is regularly used, there exist unit test tools.  Right now for Alteryx there really isn't an available tool for us to use.

Now as any of you that know me well will know, one of my mantra's with Alteryx is "if it doesn't exist then build it" and so building a Alteryx testing tool is going to be stop 1 on our journey towards TDDing an Alteryx macro.  Look out for my next post when will we start to do just that.