Architecting AWS for an Alteryx Server

Intro

Deploying your Alteryx server onto EC2 on AWS is a great way to get your server up and running fast. The problem is that if you treat EC2 just like a Virtual Machine provided by your IT team, you don’t get any of the security that they would normally deploy. This isn’t an issue, you just have to take a few additional steps when designing your AWS resources for the Alteryx Server to minimise the security risks.

To build a secure AWS environment, you want a public and private subnet to deploy the compute resources into. Multiple Security Groups should be used to manage server access to all the resources. Finally, an Application Load Balancer should be used to direct external traffic and provide the SSL management.

What are our Design concepts

When designing an AWS environment for your Alteryx Server choosing the EC2 instance size is the easy part. If you pick an instance with the right number of cores for your licence (remembering that Alteryx licensing treats 1 vCPU as 0.5 cores) and enough RAM (generally 32gb to get started). What we normally recommend is a m5.2xlarge with 8vCPUs (for a standard Alteryx 4 core licence) and 32 GB Memory.

The real question we need to answer is what other parts are needed in the AWS setup? How many subnets do you need? Should the subnets be public with an internet gateway? How will you access the instance for maintenance?

When you architect this system the challenge is you face is how to answer each of these requirements while minimising the possible attacks on the server.

What does our Network Stack look like?

The components that make up our networking stack are built around an Alteryx Virtual Private Cloud (VPC), a public subnet and a private subnet

What Subnets are needed and what Availability Zones should you have?

When building out the network environment we need two subnets, one public and one private. This is because we want to protect the Alteryx host from the outside world in a private subnet (which has no direct internet access). To allow the missing internet access we also need a public subnet with an internet gateway attached (IGW documentation).

Inside each subnet you need access to multiple availability zones. In the public subnet this is because the load balancer (ALB) requires at least two Availability Zones. For the private subnet, it allows for a more robust future scaling pathway.

When we start deploying the EC2 instances they will be placed in the private subnets. How they get deployed will be talked about a bit later.

How to communicate using Security groups

Once you have your environmental created you will need to create 2 security groups. One for the public subnet which will allow for external access, the other for the private subnet which only allows access from the public subnet. This arrangement funnels any access to the server from the public subnet, meaning port scanning wont find any open ports on the server and minimise the chances of getting access to the Alteryx Server host without permission.

The Public Security Group

The public security group will allow access from port 443 for HTTPS traffic. All user traffic should be coming over HTTPS with SSL activated usually terminating at the ALB in the public subnet.

There are 2 final ports that you should consider opening. First is port 5985, this is the port that is used by Microsoft Windows Remote Management this is a way of managing the server remotely from the PowerShell command line. It allows you to install drivers, update windows configurations and modify system settings without needing to open a remote desktop application and using the windows desktop GUI to make the changes. There are some changes that you can make to the Alteryx server with PowerShell, but they are not currently well documented or fully supported if something should happen to go wrong.

The other optional port is the Remote desktop port, 3389. I say that this is optional but its only optional in terms of which security group you place it in. Even if you have opened port 5985 for remote management, you will need to have Remote desktop access for a most Alteryx configuration changes. You can put this port permission into a separate security group which is then applied to a Jump box or Bastian host which I’ll talk about later

Once you have decided on what ports you have open you need to decide on where access should be allowed from. For the http/s traffic it is usual for that to be limited to just the company network IP addresses. If you allow traffic from anywhere it means that anyone who knows your server URL will be able to access the server and see anything published to your public gallery. In most cases this isn’t the best idea for you server, and ensuring that only those inside you company can access the URL is a good security practice.

If you decide on having the RDP access provided by a separate security group (which would be the better system) then you can limit that access even further to the IP addresses of just the server administrators, and that could also be done on an as needed basis.

The Private Subnet

For the Private subnet security group the there are 2 sets of permissions you need, first for HTTP traffic and RDP access but only from the public subnet, and second is internal HTTP communication on port 80. Port 80 is used by the Alteryx Server[PH1]  for internal communication by the Alteryx Service.

No other access IP locations (so general internet or even other security groups within the company) would be allowed access to the server. Later you will need to add additional ports for access to databases, email sending or other resources, but for the default configuration this these two permissions in these security groups are all that you need.

How to talk to the world with an Internet Gateway and Route Tables

An internet gateway is an application in AWS which routes traffic from your VPC into the wider internet (and vice versa). The internet gateway is basically what determines the difference between a public and private subnet. If the subnet’s Route Table (your VPC’s router for passing traffic between resources) has a direct route to an internet gateway then it is a public subnet (because the resources can access the public internet). If it does not have a direct route to internet gateway then it is private subnet.

Even in a private subnet you can still get resources to have access to the internet (via the an internet gateway) but it is by an indirect path through a public subnet. In our environment that indirect path would be from the private subnet with Alteryx server to the public subnet with the ALB then finally to the end user via the IGW.

So where do we put our compute resources.

There are two parts to our compute resources, the public access resources (and I include the load balancer in this category) and the private Alteryx Server compute resources.

Where to put the Alteryx Compute?

In the a single node environment, we have created some networking resources (like the multiple private availability zones) which wont be needed by Alteryx. They are there for future scaling capability and as a requirement of AWS.

The compute we need for the Alteryx Server is any windows EC2 instance that meets the recommended requirements for an Alteryx Server (my base recommendation is m5.2xlarge). This could be customised in future to tune the performance requirements as needed, such as an instance with more available RAM or scaling up to faster CPU instances. This would also be a starting point for scaling the environment horizontally by adding additional worker nodes which I mention later.

What about Load Balancing?

Getting external traffic to the Alteryx server will be achieved my deploying an application load balancer (ALB) to accept traffic from the public internet and forward it to the Alteryx server. I’ll expand the security groups that will be needed later but in short there will be on security group allowing access to the ALB and a second allowing traffic from the ALB to be forwarded to the Alteryx server

Doing server maintenance with Jump box

Once you have configured your AWS environment your access to the Alteryx server host should be very limited. The ports to the host are blocked to the outside world so you, at your own desktop, shouldn’t be able to access it either.

To gain access to the server to do necessary maintenance (like installing drivers, or doing updates) will implement a solution called a jump box or a bastion host. This is a low power server that you will be able to remote into, sitting inside the public subnet (meaning there is external access), that has network permissions to remote desktop to the Alteryx server through a security group. This can be a very small EC2 instance as you are just looking to get remote access but not actually do any processing on this instance.

You could also implement Network Load Balancer (NLB) but that can add some complexity if you don’t have some additional external process for managing the access to the load balancer

How do we grow with Scaling?

The final consideration is how will you scale this environment? This setup is designed with that use case by default. With the dual subnets and multiple availability zones implemented, adding the resources to increase capacity is achieved by adding the extra nodes as required.

The most likely first step for scaling would be to add additional workers. In order to give some future redundancy it is recommended to add the worker nodes into each availability zone evenly. This means make sure all AZs have a worker before adding a second to any individual AZ.

It is important to clarify that this does not give High Availability in the rare event of an AZ failure. If the AZ that holds the controller and MongoDB happens to go down, then your entire server will be unavailable. If any other AZ goes off-line this redundancy allows the rest of the server to continue to operate.

When you look to provide some redundancy in the persistence layer (migrating from the embedded MongoDB to a self managed MongoDB) then a similar deployment pattern should be followed.

I will cover a the full process of scaling a server in AWS in the future as there are a number of additional considerations when addressing that topic. Decisions like when to scale, what to scale, what resources are needed are all key to the HA conversation.

What have we learned?

When deploying an Alteryx server in AWS there are lots of components that are often not considered in the rush to get a proof of concept off the ground. I’m going to release a CloudFormation template that would deploying all the resources described in this blog soon but this should provide the basis of what I think is the best way to get started.

If you have any suggestions on what you think is a better way to set up your environment I would love to hear it.


 [PH1]citation needed