Malaysian Airport Incident – A case study

Last updated: 4 September 2019

Acknowledgement

The information provided in this post was through crowdsourcing, thanks to the IT Security SIG set up by Nigel Rodrigues, contributed by many, with candid discussion which inspired me to write this article.

As this incident is still developing, this article will be updated with the latest information, and what you see here is a snapshot in time at the point.

The incident

On 21 August 2019, KLIA/KLIA2 airports begin to experience system and technical difficulties. The failure affected check-in counters, flight information display systems (FIDS), baggage handling, its airport mobile app, as well as payment systems which rely on its networks. Ostensibly, tempers and dissatisfaction among airport users were high.

Timeline

20 August 2019 – MAHB signed a MOU with Huawei on technology modernization.

21 August 2019 – KLIA/KLIA 2 reported system/technical issues affecting multiple systems in the airport. Initial news indicates a failure at the network equipment.

22 August 2019 – The Star reported that MAHB had informed that the situation will be resolved by 23 August 2019, as it has received new equipments to replace the existing ones and testing to be conducted on the same night.

23 August 2019 – MAHB updated their website (as at 6am) explaining that they are in the midst of stabilizing their system and had deployed additional buses to ferry the passengers to their respective terminals.

24 August 2019 – The Malay Mail reported that the situation had improved, passenger flow has been reported smooth with intermittent disruptions.

24 August 2019 – NACSA issued a statement to affirm that there were no cyber attacks which resulted in the network issue at KLIA/KLIA2.

25 August 2019 – The Malay Mail reported that KLIA/KLIA2 Operations has been restored to normal  based on a check by BERNAMA at 0930.

26 August 2019 – Ministry of Transport announces a panel to investigate the system failure of TAMS (Total Airport Management System). NACSA is one of the members who forms part of the committee.

26 August 2019 – MAHB in a statement was quoted saying that they are not dismissing the possibility of malicious intent that may have caused the incident.

26 August 2019 – Airport passengers were stating that its not a full service recovery, the information system was still down and the airports were operating at partial system availability.

27 August 2019 – Airlines seeking compensation from MAHB due to airport system down.

27 August 2019 – MAHB lodges police report over possibile malicious intent being cause of downtime.

28 August 2019 – PM orders probe to the airport downtime incident.

29 August 2019 – PDRM said to be probing 4 in relation to airport system failure, based on the report made by the IT division senior general manager.

30 August 2019 – AirAsia, a malaysian carrier is said to confirm not to sue MAHB due to the recent airport system failure.

2 September 2019 – Police said to have recorded 12 MAHB staff statements over the system failure incident.

3 September 2019 – 4 MAHB pioneer IT officers lodge counter police reports against MAHB. They were suspended, and claimed false accusation.

The cause

The details are vague, however the incident was pointed out to a faulty IP network switch which caused the IP network traffic to get to a grinding halt. The switch in question seems to be the core switch which processes all the network traffic for the airport.

Core switch is usually responsible for traffic between each segments, also acting as an aggregation point. Each area is connected via a smaller switch, point to an intermediate or aggregation switch which leads to the core switch. In this case, since the core switch is down, the segments are disconnected, through each machine shows as network connection as connected. Access to upstream, such as Internet, which is used for credit card payment gateway, is also interrupted as the traffic stops at the core since its not working.

Social Media Buzz

It was noted that a user, claiming to be a subcontractor to MAHB said that the network switch had been 17 years old and had not been changed since. This is unconfirmed, pending official statement from MAHB.

A report from Utusan Malaysia also have mentioned something similar, an excerpt mentioned here.

Related news

Just one day before the incident, on the 20 August 2019,  MAHB signed an MOU with Huawei “to drive MAHB’s digital transformation framework by enhancing connectivity and real-time information by connecting all stakeholders in one fully integrated digital ecosystem. The collaboration would also seek to set up a fully integrated network communication managed platform to manage above technology and integrated data to enable future big data analysis throughout the entire airport, further improving airport operation efficiency and reduce overall ICT cost.”

It’s probably sheer luck, the network equipment failed the very next day, seemingly catapulting the priority of this initiative.

Assessment

At this point, lack of official news seems to lead to multiple speculation. The first would be that the airport was under a cyber attack. This news was quickly quashed by NACSA, confirming that there were no attacks.

Another discussion lead to the belief that there should have been sufficient DR (Disaster Recovery) infrastructure to ensure business runs as usual. Assuming the social media news was right, most networks designed at that time would have had a typical star topology, whereby layer one connectivity would cascade back to a single core switch. Using Cisco as example, the spine and leaf architecture would have allowed the network to be redirected to a different core, should that had been the architecture. Spine and leaf is still a new concept, there may be others which any organization can adopt.

The Good

MAHB had been mobilizing their own staff, by recruiting and promoting initiatives to get them to assist the passengers during these trying times.  A poster was seen circulating on social media dated 22 August 2019 asking to assist the situation at KUL during peak hours (12 – 2pm & 4 – 10pm).

MAHB had exhibited strong understanding of the airport processes, being able to manage with manual processes and having pure manpower to handle the airports operations while the system was down.

Flipside

Assuming the theory about 17 year old network equipment is true, there can be 2 possible outcomes. The first, an overzealous CIO might end up saying “We should sweat our assets more, make sure you don’t buy anything new for the next 15 years! (BTW are we using the same brand as the airport?)”. Scary, to say the least! Worthwhile to remember that computer/network hardware are susceptible to degradation over time, even to the network copper wire, hence some data centers make it a point to “re-cable” their infrastructure periodically! Other views include “we’re not an airport, we wont need to worry about it”.

The second outcome is that investment on IT now becomes justifiable, as part of technology refresh. More prudent approach to technology life cycle emerges and that the MAHB story becomes a talking point at the Board level, raising the question of whether the assets in use are still (1) maintained, with necessary support and (2) prior to End-of-Life/End-of-Support. This is in line with managing tech debt, ensuring that such compounding interest doesn’t suddenly pop up!

Lessons learnt – so far

1. Have manual processes that will stand in if something fails. Can you operate without technology?

2. Understand the implications of tech debt. It’s a matter of time before it catches up and as an organization then pays the compounding interest. Reputational damage becomes severe and takes time to recover.

Reference

  1. Malay Mail – https://www.malaymail.com/news/malaysia/2019/08/23/mahb-network-failure-caused-systems-disruption-at-klia/1783638
  2. MAHB Official PR – https://www.malaysiaairports.com.my/media-centre/news/klia-network-disruption
  3. NACSA PR – https://www.nacsa.gov.my/doc/Press_Release_MAHB_KLIA_English.pdf
  4. TheStar MAHB Huawei MOU – https://www.thestar.com.my/business/business-news/2019/08/20/huawei-malaysia-to-support-mahb039s-digital-transformation
  5. Cisco Spine & Leaf Architecture – https://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-series-switches/white-paper-c11-737022.html
  6. Copper degradation – https://www.quora.com/Does-a-signal-sent-over-a-cable-network-degrade-over-time
  7. Potential malicious intent – https://www.thestar.com.my/news/nation/2019/08/26/mahb-not-ruling-out-malicious-intent-behind-klia-glitch
  8. The Star (22 Aug 2019)  – https://www.thestar.com.my/news/nation/2019/08/22/mahb-expects-klia-glitch-to-be-resolved-by-friday-morning-aug-23
  9. MAHB update (23 Aug 2019) – http://www.malaysiaairports.com.my/media-centre/news/latest-update-systems-disruption-klia-0
  10. The Malay Mail – Day 3 – https://www.malaymail.com/news/malaysia/2019/08/24/klia-systems-still-crippled-but-operations-improving-on-third-day-video/1783804
  11. The Malay Mail – Day 4 – https://www.malaymail.com/news/malaysia/2019/08/25/klia-operations-back-to-normal-after-system-outage/1784006
  12. The Star – https://www.thestar.com.my/business/business-news/2019/08/27/airlines-to-seek-mahb-compensation-for-delays-losses
  13. https://www.nst.com.my/news/nation/2019/08/516444/mahb-lodges-police-report-klia-systems-disruption
  14. https://www.thestar.com.my/news/nation/2019/08/28/pm-wants-probe-into-klia-systems-malfunction
  15. https://www.malaymail.com/news/malaysia/2019/08/29/report-police-to-probe-four-over-klia-systems-disruption/1785325
  16. https://www.nst.com.my/business/2019/08/517398/airasia-wont-sue-mahb-system-glitches-klia-and-klia2
  17. https://www.malaymail.com/news/malaysia/2019/09/02/klia-systems-disruption-police-record-statements-from-12-mahb-staff/1786552
  18. https://www.nst.com.my/news/crime-courts/2019/09/518461/4-mahb-it-officers-lodge-police-reports-against-their-employer-over

IT vs Cyber Security – Technology Debt

Where are we today?

Almost on a daily basis, we are bombarded with news of cyber attacks, breaches, data leaks and more. It’s as if cyber related issues are becoming a norm, so much so someone was quoted saying “ There are 2 types of organization; the ones that has been breached, and the ones that have yet to be”. As such, all organizations are putting emphasis on spending for continuity, and one question gets asked quite frequently. How much is enough? Is there a magical percentage that a CEO needs to consider as part of healthy spending to ensure that the safeguards are sufficient to manage today and tomorrow’s risk?

While there are research done on average spend buy organization, that is not an accurate reflection of what a particular organization’s spend pattern for protection of its assets. This article aims to demystify the topic on technology debt, using Security as a factor,  in order to identify right-spending for an organization. Technology debt is just one of the consideration to put into place when evaluating tech spend vs. security spend as a consideration.

What is a technology debt? 

A debt is defined as owing. When someone borrows money, they are obliged to return in (in most instances with an interest). A technology debt concept is no different than a conventional debt, however it is in the form of considerations, protection and governance aspect in rolling out current and new technology.

The concept of interest in technology debt is the occurrence of an event which creates an additional burden to the organization. Example, a cyber breach causes additional overheads from manpower utilization, engagement of relevant third party for services such as recovery, forensics as well as additional expenditure incurred.

Does Technology incur a debt? How does it work?

To illustrate technology debt, there will be 3 examples on how technology debt is incurred.

Scenario 1

An end user procures a computer for home use. He gets the operating system installed and starts using. He/She finds the computer very useful and engaging, starts using it for not just work/assignments, but also for personal content consumption such as videos, websites and even social media. One day, the user encounters a phishing email, which leads to downloading an attachment which infects the user with a ransomware. As his work is important and needs to be sent to the customer, the user ends up paying the ransom.

For this case, the tech debt was incurred at the point of starting the use the computer. The debt was to ensure that the computer was secured, had the necessary protection in place, such as endpoint protection and phishing alert. Because the debt as been incurred, the user ends up paying with interest, i.e. the ransom in order to retrieve the data.

Question: does the debt end here? Yes and no. While the ransom is paid (interest), the debt (principal) is still there. The debt goes away when the user secures his endpoint/laptop/machine and removes the “debt” altogether.

Scenario 2

A hardware store has purchased a Point-of-Sale (POS) terminal for use, primarily to ensure sales tax calculations are done and the reports made available for submission to the authorities. There is a thermal printer to print out the receipts, with the computed tax value as per regulations. A barcode scanner is attached to make it easy to input item code for the data capture during checkout. It became very convenient, so much so that even the inventory was managed effectively. Life seems to have been easier, thanks to the new technology. The POS came with 1TB hard drive, which makes it almost impossible to fill it up.

One day, for some unfortunate reason, the hard drive in the Point-of-Sale machine crashed. This resulted in some inconvenience as the the items have to be manually computed. Because of the convenience of the POS system, the prices are no longer printed as the reliance is towards product barcode. A manual list with the prices had to be derived after calling the vendors for price confirmation. What made it worse? The taxation department decides to show up for an audit, demanding to see the taxation report that was suppose to be produced adhoc as part of system requirement for taxation.

The technology debt in this case is the inability to backup and restore the system. While the reliance of the system is good, the debt (backup/recovery) had been incurred, and the user ended up paying interest (fines due to non compliance, additional recovery services, manual process institution, time wastage).

Scenario 3

A mobile app development firm has purchased a server  to store their source codes. The server is backed up daily using DVD and a copy is kept in a separate site. The server is configured with detailed access control list to ensure only the right people have access to the right set of codes.

A disgruntled employee decided to take matters into his/her own hands and deleted portion of the codes on the day he/she was leaving. The manager discovered the issue when reviewing the CI/CD logs during build failure and found files missing. Upon inspecting the version control software, identified the malicious action that has taken place. The manager proceeded to recover the part of the tree that was lost, and compared it against the backup that was kept to ensure that the changes made were consistent.

This case shows zero debt scenario. While deploying the solution, the IT team took into consideration requirements for backup, audit logs and continuity plan. When a potential “interest” scenario came up, because the debt was zero, there was none/minimal impact to the organization.

How does tech debt influence budgeting?

As technology gets deployed, as illustrated above, debt starts coming in. In some organizations, the debt is addressed up front as technology is deployed to avoid interest. Some organizations pan out the debt over time, in hopes that the interest will never come up.

How does this influence budgeting? The budget to manage security will include ensuring that the debt is being addressed timely. For organizations that has incurred debt, in order to zeroize the debt, expenditure needs to be done. As budgets are usually one line item for an organization, this then is seen in the percentage split between IT spend vs security spend.

Hence, for some organization which heavy tech debt, the budget will be more towards resolving the debt rather than expansion of IT. The percentage split will be skewed as the debt now influences the spend percentage.

Another reason why the spend will be skewed is when the interest come into play. Due to an incident, the interest becomes mature and payable. This creates additional expenditure which eats up into the budget. Post incident usually sees organization putting more emphasis into governance and control, almost having a blank cheque to show commitment, including in most instances, hiring a CISO that reports directly to CEO and Board.

The result, difference in spend percentage compared to overall budget based on level of debt resolution, depending on the state of the organization. Mature organization depends on resolving debt as the technology is incorporated, while other play catch up, due to business and budget limitation. What’s important is to be mindful that the debt may spring an interest at any time, causing organization to end up spending more. Delayed investment may result in heightened expenditure.

While the scenarios presented above may be simplistic, it is worth remembering that technology debt is often multi-dimensional and require an in-depth study to ascertain the respective areas of protection required. In the future article, we can discuss about this aspect of multi-dimensional tech debt and how to look at resolving the debt and preventing interest.

Moving forward

The crux of this article was to make a clear distinction between why different organization had different budget spend split. Though a baseline of spend helps CEOs identify whether the spend is healthy, understanding the technology debt help to justify why the spend needs to be more for some organization. While most organizations look at analyst report on average security spend, it is wise to ensure that technology debt is kept at check to ensure lack of interest popping up.

Perhaps if there is enough interest, then I can write up on identifying and resolving technology debt.

Geopolitical considerations as part of Technology risk

This thread started off as a discussion at the local Mamak (the Malaysian colloquial terminology for your cafe). A bunch of security and tech folks meet up to ponder upon the world and business woe.

The discussion started off with the question “How do you decide on your tech purchase? What are your consideration factors?”

Our conservative buddy came up and said “You can never go wrong with Brand X! Tried and tested”. That seems to indicate that the selection criteria is based on market presence, branding and prominence. As well as adoption.

The bleeding edge/challenge the status quo person came up and said “Why not Open Source?” Its mature enough for adoption and more organisations are cozying up to the idea that Open Source will work, provided that support is available.

Then comes in the CIO, whom made it clear that his/her choice will be cost based. Why bother paying premium and consider alternatives when you can get a good bargain at a reasonable choice? Pricing would be the ultimate deciding factor, provided that it meets bare minimum.

I had to open my mouth and ask, ”what about geopolitical consideration?” Everyone had a flustered look, some in amazement and some pretended that was not even the case. Geopolitical? Is that even necessary?

What is geopolitical consideration/risk?

This is a consideration when you view the origin/source country of technology and consciously make a decision to use technology from another country. Example, if the first tier of firewall originates from US, the second tier of firewall may be purchased from Russia (ignoring the underlying hardware may all originate from China, the consideration here is based on vendor origin, not part origin, although that may be a severe version of geopolitical based risk separation).

History Lesson – PGP

A little bit of history lesson on technology, starting from cryptography. PGP was created by Phil Zimmerman in 1981. PGP was created with the intention of securing communications between activists and to prevent snooping. The software was free to use, as long as its not for commercial use. Eventually PGP ended up on the Internet, being adopted for widespread use as an added encryption layer on top of emails.

In 1983, Zimmerman became target of prosecution. Cryptographic capabilities above 128 bits became subject to export restriction and Zimmerman’s PGP was using keys with defaults of 1024. Zimmerman became a target, due to violations on “munitions export without license”. Definition of munition includes “guns, bombs and even software”. For unknown reasons, the case never proceeded and was eventually dropped without any criminal charges filed.

Zimmerman was determined to make his software public. He identified a loophole, in which the First Ammendment, protects the export of books. Through MIT Press, Zimmerman published the source codes of PGP. One had to simply procure the book, scan the contents and make it digital using OCR (Optical Character Recognition); or simply type the code into a program.

More challenges on export

A similar situation happened to D.J. Bernstein. He wanted to publish the source code of his Snuggle encryption system. Together with EFF, DJ Bernstein challenged the export ruling. After 4 years and one regulatory change, Ninth Circuit Courts of Appeal ruled that software source code is protected by the First Amendment, and government preventing the publication is unconstitutional.

Why geopolitical risk?

The world is already borderless. Technology crosses boundaries easily without much hassle. However, G2G relationships are never that smooth (G2G – Government to Government). Technology sold by a company is governed by the laws in which that company is HQ’ed. Hence indirectly, law of the land plays an important role in ensuring that governments play an indirect crucial role in determining the availability of technology.

The most common technology denominator is the USA. USA produces majority of technology innovations which the world uses. An example used in the earlier part of this article is encryption/cryptography technology. As algorithms become prevalent, the use of these algorithms often become subject of export restrictions.

The rise of nation states

Borderless world creates borderless problems. The hacking scene (not the “Texas Chainsaw Massacre type”) used to be fueled by hormone-raged idealistic filled teens, or just curious cats trying to learn tech. But today, dominance in cyber space is seen as a sign of “Cyber-sovereignty” and arms race towards cyber dominance becomes imminent. (Man I really abuse the cyber word this time…)

As explained earlier, the battle ground has shifted into the cyber world. Corporates are becoming the unwilling victims in the fight towards dominance. Nation-states may infiltrate large corporate organizations in order to further their agenda, by implanting their tech folks which directly influence the product build. This means that product that gets shipped out may potentially be inhibited with malicious code, backdoors or even intentional vulnerabilities in order for nation state actors to freely abuse.

Export laws, sanctions and politics

Open any news site right now and you’d hear about trade wars between government. In the recent news, one government has stood firm and taken actions against another country for alleged espionage. This resulted in key companies in the country being denied business and imposed high levies and taxes. The situation created a “tit-for-tat” reaction, causing a downward spiral of impact towards other organizations which forms part of the ecosystem.

Standards and tech volition

If export restriction becomes apparent, in a new twist to the developing stories, standards organisation are now becoming subject of such ruling. One standards body which is referred to worldwide has stepped up and imposed bans towards researchers from a said country from being moderators or participating in standards building. This has far reaching impact to the global community.

Firstly, other countries who are not part of the trade war are now unwilling victims as the standards body align themselves towards the country stance. Secondly, the countries now have to re-evaluate and establish their own standards, or subscribe to a common standard which all vendors should be given a chance to participate. ISO (International Standards Organization) is a global standards body which prides on being independent from country level politics (while the standards being voted are based on country lines and affiliations).

In one hand, you need a standards body as reference point, and in another you’ll need to start excluding standards body which shows affiliation towards country level policies. Aligning standards into a country specific set will be another arduous task.

Long story short

Countries today can no longer exclude geopolitical factors of risk. This is prevalent today, looking at the recent developments in the international arena and current trade wars and Brexit. While moving towards Industry Revolution 4.0, it is important to no longer be in a nutshell, but understand that borderless is a reality and new sets of regulations are emerging to govern tech and it’s use.

Insider Threat – A look at AT&T incident

In a recent expose published by SecureWorld through court documents seen, this issue has suddenly hit the spotlight.

The damning question, can your employees be bought?

Lets look at the reported news on the incident experienced by AT&T Wireless. The A&T& Wireless call center in Bothell, Washington is where this had begun. Call center employees knowingly shared their credentials with the cybercriminal, in exchange for money. According to DOJ based on the indictment documents, one call center employee who made the most had paid “$428,000 over 5 years scheme”.

There were 3 things that the employees did

  1. The employees were instructed to install malware in their machine.
  2. The employees installed unauthorized access points, hardware devices to create a backdoor into the network.
  3. The employees installed a specialized malware that performs phone unlocking through AT&T’s internal network using valid AT&T credentials that were obtained from the call center agents.

The objective of the “intrusion” was to create unlocked phones. Phones that are sold in the US are carrier locked, meaning once the phone is provisioned, only AT&T services can be used on those phones. Having the phones carrier unlocked creates a huge market, selling them on eBay and other online stores.

This begs the question, why would the phones require to be unlocked in the first place? The phones are locked to a carrier because it is subsidized and requires a contract. When a user travels overseas, the phones may require unlocking for roaming purposes, hence unlocking becomes a legal function of the call center.

This racket had netted more than 2 million phones to be unlocked and sold. At the rate of an iPhone price, one can only imagine how much money is there to be made.

According to the official documents, the scheme  began somewhere around 2012, and around October 2013, AT&T discovered the unlocking malware. When questioned, the AT&T staff in question left the organization. The criminals were determined, recruiting new insiders in the same call center on the subsequent year. Recruitment happened through Facebook (surprise, surprise, and not LinkedIn) and the bribes were made in-person. The cybercriminal, known as Muhammad Fahd, is now in jail.

A breakdown of this issue

  • Call center agents sold their access and performed illegal acts in payment for money. Insider threat will remain a key issue, and it becomes a challenging issue to tackle. While a potential solution can be to look at a “lifestyle audit”, getting trustworthy staff will always be a challenge, in a market where skills are limited.
  • Valid access used for illegal activities – this may be potentially addressed by monitoring activities performed with a certain ID. This means that there is sufficient logs in place and systems to correlate and analyze system usage behavior and look at baselining activities to identify anomalies. If someone does extremely too many unblocking, a check on what is actually done is performed. This review process (often slow, painful and most of the time even manual) is usually avoided due to unnecessary workload, though i am sure AT&T would enforce this as a requirement now.
  • Installation of malware – call center agents should never ever have administrative access. Ability to install or run application should be limited, through the use of application whitelisting. However, there will still be an issue of a malicious IT Technician, which may have been possible in this scenario.
  • Installation of access points – rogue access points can be detected with Wireless Intrusion Prevention systems. However, WIPS presents different set of problems, as it may potentially deny the use of wireless due to neighboring building APs having signal spillover, effectively causing a denial of services attack.

Taking the original question into perspective, can employees be bought? The answer to this question is multi-faceted while the technical challenges can be addressed

  1. Getting more money is always appealing to everyone around. Looking at the money being made, a call center agent would have jumped to the occasion because of the sheer amount to be made.
  2. Moral obligation of doing the right thing. In any such cases, you’d hear many reasons why the staff did what he/she did. From the point of making ends meet, to doing something that didn’t hurt anyone, moral standing has always been on shaky grounds.
  3. Economics of organizations also play a part. Income disparity, job satisfaction vs load becomes a talking point. Most call center agents bear the brunt of the customers, and often, even yelled at. Hence call centers become a churning pot for most organization, and those who stay are often resilient, understanding that it is a thankless job.
  4. Making examples – some organizations motivate employees to do the right thing by (1) having a whistleblowing policy to aid reporting and (2) showing examples of action taken against wrong-doing. Denying someone their livelihood has always been a key motivator to do the right thing.

Reference

SecureWorld – https://www.secureworldexpo.com/industry-news/insider-threat-at-att-wireless-activated-by-a-cybercriminal

 

Capital One – The Breach

Capital One (PRNewsFoto/Capital One Financial Corp)

The incident

Capital One issued a press release on 29 July 2019 that there was an unauthorized access by an outside individual who obtained access to it’s customer information. The information that was obtained were credit card application information, for applications between 2005 to early 2019. Information breached includes

– Name

– Addresses, ZIP/Postal Codes

– Phone number

– Email addresses

– Date of Birth

– Income information (self reported)

– Status information – credit scores, credit limits, balances, payment history, contact information

– Transaction data from a total of 23 days during 2016,2017 & 2018

– Social Security Number of 140,000 and 80,000 linked bank account of secured credit card

The existing customers doesn’t seem to be affected as the system in question was only specific for credit card application facility.

What happened?

According to CapitalOne, the “highly sophisticated individual” was able to exploit a certain configuration vulnerability. NYT added that it was a misconfiguration of a firewall on a web application, and this echoes the court documents pointing to a misconfigured firewall on Capital One’s Amazon Web Services cloud server. The information was accessed between March 12 to July 17.

More than 700 folders of data was stored on the server.

The hacker

FBI has arrested Paige A. Thompson, going by the nick “erratic”, according to Justice Department. Ms Thompson made appearance in Seattle District Court on July 29, 2019 and was ordered to be detained pending hearing on August 1,2019.

Ms Thompson posted in GitHub regarding the information theft, which was reported by a GitHub user to Capital One on July 17, 2019. Capital One contacted FBI on July 19, 2019 after confirming the breach to be legitimate. FBI confirmed the identity of the attacker.

Ms. Thompson has worked with Amazon Web Services before. It was also evident that Ms Thompson left online trails of her hacker activities. She is listed as an organizer for “Seattle Wares Kiddies”, a group on Meetup, which lead to her online identities at other social media such as Twitter and Slack. The nick “erratic” was identified back to Ms Thompson as she had previously posted a photograph of an invoice for a veterinarian care services.

Ms. Thompson was quoted saying that “I’ve basically strapped myself with a bomb vest” in a related Slack posting, according to the prosecutors. If convicted, Ms Thompson will face the possibility of a USD250K fine and up to 5 years jail term.

The victim (?)

Capital One had anticipated that they would be incurring loss of up to USD150 million, which includes paying for the customer’s credit monitoring services. The credit monitoring services and identity protection services is offered as part of compensation for those affected.

Capital One may also be facing potential regulatory fines/sanctions, which at this point of time is still undetermined, as well as lawsuits.

New York Times was also seen to report that Amazon has refused any blame as part of the incident. Amazon told Newsweek that “this type of vulnerability is not specific to the cloud“. Misconfiguration, be it at the application or data bucket layer seems to be leading cause of data theft from cloud infrastructures, as seen the past such as Attinuty. Amazon maintains that “you choose how your content is secured“.

Situational Analysis

SocMed seems to be abuzz about whether the focus should be on the attacker, since its a criminal offense, while Capital One walks free. While the attacker may have done crime, the question is, could it have been prevented?

From a criminal aspect, what Ms Thompson did is illegal. The proof, which seemly handed by Ms Thompson herself, due to number of posts/articles, as well as poor opsec due to posting of the invoice. The public persona of Ms Thompson indicates her leaning towards hacking, and postings on the social media channels indicate admission. The prosecutors would have all the necessary evidences to convict Ms. Thompson, following the digital trail. In my opinion, seems like the prosecutors have an open/shut case in their hands.

Capital One was also dissected on social media for it’s role in the incident. The question remains if Capital One had done everything it possibly could to ensure such issues do not occur. Reading from the press release, it seems that Capital One looks to “augment routine automated scanning to look for this issue on a continuous basis”. Not sure how to interpret that, whether a routine automated scan has been recently introduced, or whether the scan itself was enhanced to include misconfiguration related issues.

What’s next?

Companies with cloud presence has a different set of security concerns to address. While traditional on-prem presence seems to indicate better control. Some quick action items to be done for organizations concerned with such issues

I. Train your staff on cloud security. It can be provider specific as well as provider agnostic.

II. Providers such as Amazon/Azure has configuration templates which can be used to securely roll out services. These configurations are secure by default and will not allow any insecure setup. Insecure setup should be reviewed and follow internal process for deployment and approval.

III. Deploy tools to check for misconfiguration on a periodic basis.

IV. Separate instances based on type environment – Development/Testing/Production

V. Enforce strict IAM/PAM (Identity Access Management/Privilege Access Management) to ensure access is managed effectively

References

  1. New York Times – https://www.nytimes.com/2019/07/29/business/capital-one-data-breach-hacked.html
  2. TechRadar – https://www.techradar.com/sg/news/capital-one-hit-by-major-data-breach
  3. CNET – https://www.cnet.com/news/capital-one-data-breach-involves-100-million-credit-card-applications/
  4. US Dept of Justice – https://www.justice.gov/usao-wdwa/pr/seattle-tech-worker-arrested-data-theft-involving-large-financial-services-company
  5. Capital One – http://press.capitalone.com/phoenix.zhtml?c=251626&p=irol-newsArticle&ID=2405043
  6. NewsWeek – https://www.newsweek.com/amazon-capital-one-hack-data-leak-breach-paige-thompson-cybercrime-1451665
  7. US Department of Justice – Case details – https://www.justice.gov/usao-wdwa/press-release/file/1188626/download

Do you need BCP for Cloud?

I woke up feeling very warm. I thought I missed the alarm, but its just 3:23 am. Very sure I don’t need a potty break, extremely sleepy and obviously upset. Leaned over to see the AC (air-condition), and I found that it was off. I’m very sure its too warm and by now the AC should have kicked in. Mumbling, I woke my already tired and weary body and walked towards the thermostat to see what’s happening.

After blinking a few times to get my sight back to normal, I found that the Nest thermostat isn’t working. Walking back to my bedside table to grab my phone (I know, it’s a bad habit), I checked to see if the internet was down. WiFi seems up, checked my public IP (instead of good ol’ ping), everything seems okay. Google search shows up okay. Still with sleep in my head, I rummaged through my bedside drawer for the remote and turned it on. “This is too much work” – grumbled my half sleepy head. That’s enough for the night.


Woke up in the morning with a sleep hangover (yes, its possible, when you don’t have enough sleep), I was trying to figure out what happened. Turned on twtr and true enough, reports on Google Cloud services failure starts trickling in.

URL: https://status.cloud.google.com/incident/compute/19003

The horror! Google Cloud services went down?

*My panicked head screaming – The sky has fallen! The Sky has fallen!*

This pretty much explains why the thermostat went down. I wondered how may threat actors lost their C2 hosted on Google Services, how many IOT devices like the Nest Thermostat stopped working and other dependent service. If as an end user I am grumbling on the service availability, how about corporate organisations relying on Cloud services ?


Today’s organization rely heavily on cloud. Business today runs on cloud. Social media runs on cloud. Almost everything runs on cloud. Whether it’s server/virtual servers, serverless, functions (you name it), runs on cloud. (Disclaimer, most of my stuff also runs on cloud…)

But, is cloud outage a rarity? Well it depends on what you deem as rare. The Internet forgives, but never forgets. In August 25, 2013, AWS suffered an outage, bringing down Vine and Instagram with it. March 14, 2019, Facebook went down, bringing WhatsApp together in an apparent server configuration change issue.

The impact is obvious, business will lose revenue when the services goes down. Local franchise such as AirAsia, runs their kit mostly on Cloud. The impact is devastating, imagine booking of flights goes dark. So does a lot of other business. Hence this brings an interesting point: What is your business continuity plan if cloud goes down?

When I had this conversation a few years ago, most CIOs I spoke to boldly claim that their BCP is the cloud (we never reached the part about cloud and security because its most often dominated by the cost debate). There is no need, due to the apparent global redundancies of cloud infrastructure. The once-sleeping-soundly-at-night CIOs are now rudely awaken (just like me, due to the broken thermostat) that cloud no longer offers the comfort they can afford, after investing years of CAPEX (capital expenditure) and happily paying cloud services their monthly dues to show that their services are up.

Few points to note for those interested in even thinking about Cloud BCP. Yes, its time we take the skeletons out of the closet and start talking about this.

Firstly, can your application and services run a completely different cloud provider? Let’s look at the layers of services before we answer this question.

XKCD - The Cloud

If you are running server images (compute cloud), it’s completely possible to run in a different cloud provider. You’ll need to be able to replicate the server image across cloud provider. You can archive the setup of your cloud server via scripts, create a repository to host your configuration files and execute the setup script to bring up the services in a separate cloud provider. The setup and configuration can be hosted in a private git/svn repository and called up when needed.

What about data? Most database services provide for replication and data backup services. For “modern” database services, data can be spread across multiple database for better data availability and redundancies.

The actual stickler for hybrid cloud is serverless/function based hosting. If the organization invests heavily in one particular cloud provider’s technology (without naming any particular provider), then it depends on the portability of that technology. If something common such as Python is used, the portability is pretty much assured. Technologies that are exclusive for a cloud provider will have issues of portability across different cloud providers.

Another question that needs to be answered is, how would you “swing” your services across different cloud providers? A common approach for internet availability is to use DNS services. Using DNS, the organization can change the location of services by changing the DNS records. This would allow seamless failover without having to change the URL. However, speed of failover will be determined based on the DNS TTL (time-to-live) configuration of that record. Too low, your DNS will be constantly hit with queries, but changes are almost instantaneous (usually a low TTL is around 15 to 30 minutes). Too high, your DNS infrastructure will have low traffic, but takes a long time before the failover actually happens. DNS based failover also creates administrative headache for firewall administrators as they have to change their approach from IP based to a DNS based access control list.


All of cloud isn’t just hot air. Moving towards Industry 4.0 (now I’m just throwing buzzwords around), Cloud adoption is definitely a core component of the technology strategy that each organisation needs to have. As times goes by, we find that even cloud is fallible, hence a proper approach towards Cloud is key in business continuity.

So, what’s your approach towards Cloud Services BCP?