Last updated: 4 September 2019
The information provided in this post was through crowdsourcing, thanks to the IT Security SIG set up by Nigel Rodrigues, contributed by many, with candid discussion which inspired me to write this article.
As this incident is still developing, this article will be updated with the latest information, and what you see here is a snapshot in time at the point.
On 21 August 2019, KLIA/KLIA2 airports begin to experience system and technical difficulties. The failure affected check-in counters, flight information display systems (FIDS), baggage handling, its airport mobile app, as well as payment systems which rely on its networks. Ostensibly, tempers and dissatisfaction among airport users were high.
20 August 2019 – MAHB signed a MOU with Huawei on technology modernization.
21 August 2019 – KLIA/KLIA 2 reported system/technical issues affecting multiple systems in the airport. Initial news indicates a failure at the network equipment.
22 August 2019 – The Star reported that MAHB had informed that the situation will be resolved by 23 August 2019, as it has received new equipments to replace the existing ones and testing to be conducted on the same night.
23 August 2019 – MAHB updated their website (as at 6am) explaining that they are in the midst of stabilizing their system and had deployed additional buses to ferry the passengers to their respective terminals.
24 August 2019 – The Malay Mail reported that the situation had improved, passenger flow has been reported smooth with intermittent disruptions.
24 August 2019 – NACSA issued a statement to affirm that there were no cyber attacks which resulted in the network issue at KLIA/KLIA2.
25 August 2019 – The Malay Mail reported that KLIA/KLIA2 Operations has been restored to normal based on a check by BERNAMA at 0930.
26 August 2019 – Ministry of Transport announces a panel to investigate the system failure of TAMS (Total Airport Management System). NACSA is one of the members who forms part of the committee.
26 August 2019 – MAHB in a statement was quoted saying that they are not dismissing the possibility of malicious intent that may have caused the incident.
26 August 2019 – Airport passengers were stating that its not a full service recovery, the information system was still down and the airports were operating at partial system availability.
27 August 2019 – Airlines seeking compensation from MAHB due to airport system down.
27 August 2019 – MAHB lodges police report over possibile malicious intent being cause of downtime.
28 August 2019 – PM orders probe to the airport downtime incident.
29 August 2019 – PDRM said to be probing 4 in relation to airport system failure, based on the report made by the IT division senior general manager.
30 August 2019 – AirAsia, a malaysian carrier is said to confirm not to sue MAHB due to the recent airport system failure.
2 September 2019 – Police said to have recorded 12 MAHB staff statements over the system failure incident.
3 September 2019 – 4 MAHB pioneer IT officers lodge counter police reports against MAHB. They were suspended, and claimed false accusation.
The details are vague, however the incident was pointed out to a faulty IP network switch which caused the IP network traffic to get to a grinding halt. The switch in question seems to be the core switch which processes all the network traffic for the airport.
Core switch is usually responsible for traffic between each segments, also acting as an aggregation point. Each area is connected via a smaller switch, point to an intermediate or aggregation switch which leads to the core switch. In this case, since the core switch is down, the segments are disconnected, through each machine shows as network connection as connected. Access to upstream, such as Internet, which is used for credit card payment gateway, is also interrupted as the traffic stops at the core since its not working.
Social Media Buzz
It was noted that a user, claiming to be a subcontractor to MAHB said that the network switch had been 17 years old and had not been changed since. This is unconfirmed, pending official statement from MAHB.
A report from Utusan Malaysia also have mentioned something similar, an excerpt mentioned here.
Just one day before the incident, on the 20 August 2019, MAHB signed an MOU with Huawei “to drive MAHB’s digital transformation framework by enhancing connectivity and real-time information by connecting all stakeholders in one fully integrated digital ecosystem. The collaboration would also seek to set up a fully integrated network communication managed platform to manage above technology and integrated data to enable future big data analysis throughout the entire airport, further improving airport operation efficiency and reduce overall ICT cost.”
It’s probably sheer luck, the network equipment failed the very next day, seemingly catapulting the priority of this initiative.
At this point, lack of official news seems to lead to multiple speculation. The first would be that the airport was under a cyber attack. This news was quickly quashed by NACSA, confirming that there were no attacks.
Another discussion lead to the belief that there should have been sufficient DR (Disaster Recovery) infrastructure to ensure business runs as usual. Assuming the social media news was right, most networks designed at that time would have had a typical star topology, whereby layer one connectivity would cascade back to a single core switch. Using Cisco as example, the spine and leaf architecture would have allowed the network to be redirected to a different core, should that had been the architecture. Spine and leaf is still a new concept, there may be others which any organization can adopt.
MAHB had been mobilizing their own staff, by recruiting and promoting initiatives to get them to assist the passengers during these trying times. A poster was seen circulating on social media dated 22 August 2019 asking to assist the situation at KUL during peak hours (12 – 2pm & 4 – 10pm).
MAHB had exhibited strong understanding of the airport processes, being able to manage with manual processes and having pure manpower to handle the airports operations while the system was down.
Assuming the theory about 17 year old network equipment is true, there can be 2 possible outcomes. The first, an overzealous CIO might end up saying “We should sweat our assets more, make sure you don’t buy anything new for the next 15 years! (BTW are we using the same brand as the airport?)”. Scary, to say the least! Worthwhile to remember that computer/network hardware are susceptible to degradation over time, even to the network copper wire, hence some data centers make it a point to “re-cable” their infrastructure periodically! Other views include “we’re not an airport, we wont need to worry about it”.
The second outcome is that investment on IT now becomes justifiable, as part of technology refresh. More prudent approach to technology life cycle emerges and that the MAHB story becomes a talking point at the Board level, raising the question of whether the assets in use are still (1) maintained, with necessary support and (2) prior to End-of-Life/End-of-Support. This is in line with managing tech debt, ensuring that such compounding interest doesn’t suddenly pop up!
Lessons learnt – so far
1. Have manual processes that will stand in if something fails. Can you operate without technology?
2. Understand the implications of tech debt. It’s a matter of time before it catches up and as an organization then pays the compounding interest. Reputational damage becomes severe and takes time to recover.
- Malay Mail – https://www.malaymail.com/news/malaysia/2019/08/23/mahb-network-failure-caused-systems-disruption-at-klia/1783638
- MAHB Official PR – https://www.malaysiaairports.com.my/media-centre/news/klia-network-disruption
- NACSA PR – https://www.nacsa.gov.my/doc/Press_Release_MAHB_KLIA_English.pdf
- TheStar MAHB Huawei MOU – https://www.thestar.com.my/business/business-news/2019/08/20/huawei-malaysia-to-support-mahb039s-digital-transformation
- Cisco Spine & Leaf Architecture – https://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-series-switches/white-paper-c11-737022.html
- Copper degradation – https://www.quora.com/Does-a-signal-sent-over-a-cable-network-degrade-over-time
- Potential malicious intent – https://www.thestar.com.my/news/nation/2019/08/26/mahb-not-ruling-out-malicious-intent-behind-klia-glitch
- The Star (22 Aug 2019) – https://www.thestar.com.my/news/nation/2019/08/22/mahb-expects-klia-glitch-to-be-resolved-by-friday-morning-aug-23
- MAHB update (23 Aug 2019) – http://www.malaysiaairports.com.my/media-centre/news/latest-update-systems-disruption-klia-0
- The Malay Mail – Day 3 – https://www.malaymail.com/news/malaysia/2019/08/24/klia-systems-still-crippled-but-operations-improving-on-third-day-video/1783804
- The Malay Mail – Day 4 – https://www.malaymail.com/news/malaysia/2019/08/25/klia-operations-back-to-normal-after-system-outage/1784006
- The Star – https://www.thestar.com.my/business/business-news/2019/08/27/airlines-to-seek-mahb-compensation-for-delays-losses