Well, another of those absurd, incredible, unbelievable bugs. The ones that break your day and make you lose time, energy and ruins your mood, all for some sloppy programmer – or for an all-too-complicated environment.
Brief explanation of the situation:
- I have a host with ESXi 5.1 with several virtual machines
- There’s a PfSense firewall in a virtual machine in front of everything, connected to the only switch with internet access.
- Within vSphere I created various vSwitch not connected to any physical network interface, to manage various network zones.
- I attached a virtual network interface to the firewall for each vSwitch, so to manage with the firewall various zones and allow them to selectively connect to each other and to internet.
One beautiful day I had to create a new network area. This is not a tutorial to add a new vSwitch, so I’ll assume you know how to do it.
After creating it, I added the new interface to the virtual machine, then I added it to the firewall itself. For the moment this isn’t a mission critical server, so I decided to restart the firewall.
Then, BOOOOOOOM!!!! I couldn’t access to any of the servers behind the firewall. ANY! Strangely, I could access to the firewall and through some tricks directly to vSphere. From the firewall I tried to ping the VMs – nothing. From the VMs I tried to access the firewall – nothing. The firewall was broadcasting ARP requests; the VMs told me the host was down.
At first I thought was something related to ESXi. Then I found that I could ping VM to VM behind the firewall, although couldn’t access internet or the firewall. So maybe it isn’t a vSphere problem but a virtual machine problem. But then I checked the configuration of the firewall and everything seemed to be ok.
So I thought: “Maybe is something in the configuration of the virtual machine!” – so I checked, and I noticed something very strange. NETWORK INTERFACE NAMES WERE IN THE WRONG ORDER.
Let me explain. When you add network interfaces in this way, the appearance seems to follow a first-add-first-appear rule, or maybe an alphabetical order. This time, last interface added wasn’t the last one in order, and surely it wasn’t alphabetical order. So a doubt occurred to me. Maybe something messed up the association MAC Address – vSwitch?
I checked the MAC Address – Network Zone in the firewall with the ones in the Virtual Machine configuration, and that was the case. As soon as I switched Network Zones in the VM configuration to match with the one in the firewall, everything worked correctly again.