myIdentity/JPN/LHDN – So what happened?

This news probably has died down, as it’s been some time now. I decided to take some time before doing a piece on this issue, to gather some information and perhaps provide some insights into the matter. This will be a developing article, hence I will from time-to-time update this to reflect what’s going on. As they say, it ain’t over till the fat lady sings…

The discovery of this issue is courtesy of Adnan Shukor (Hi xanda!), who discovered it in one of the underground forums. This was the posting that was discovered.

Some interesting points to note. Data is between 1979 to 1998, containing not just details in the NRIC but also additional data points (i.e. mobile number, email). The rest can be found on the MyKad itself (MyKad is the Malaysian NRIC smart card). File offered is in the form of JSON/CSV (I’m not sure if they have all the data in 2 files, or some in JSON or CSV).

Secondly, the person claims that the data is obtained from LHDN (Malaysian Income Tax dept) through myIDENTITY API.

What’s myIDENTITY?

A good internet citizen like myself would visit the website to get more information. But then the website is not available. Why? I did a little investigation.

It seems that the DNS entry was removed from the authoritative server to remove access to the website.

UPDATE: The site is back up now. DNS admins updated the DNS Server recently and the site is back up.

MyIdentity, according to the website, makes it easy for Malaysians to conduct business with government agencies by making it simpler for personal information to be accessed. Hence, the API allows any agency to query, say based on the NRIC number, and get a dataset of that NRIC from JPN.

Sounds good, in fact it should be seamless for data to be shared, eliminates duplication and redundancy since it’s only an API call away. But I also found something else on the website.

The last line of the website reads like this.

“Penafian: Kerajaan Malaysia tidak bertanggungjawab terhadap sebarang kehilangan atau kerugian yang mungkin dialami akibat penggunaan maklumat yang diberikan. Testing CRS….”

I’m going to attempt to translate based on my high school Malay knowledge. Please excuse any inaccuracies

“Denial: Malaysian government is not responsible towards any loss that might be experienced in using the information provided. Testing CRS….”

Now, why would an official government facility put such a statement? I have no idea.

Back to the show.

So we’ve ascertained that the myIDENTITY does have an API. So how would this crime be committed. I look back at the dataset generated. Between 1979 to 1998. And the NRIC has the format of YYMMDD-XX-YYYY which is well documented. If I write a simple script in <insert your fav coding language> to run numbers using that format, I can generate the request parameters. All I need is to post that information into the API.

I did some mediocre Googling and I couldn’t find the API endpoints (I’m not as l33t as most of you, age is catching up…). So, I came to the conclusion that the API is most probably private (of course). Which leaves us to the next theory.

It can only be done by someone or some entity who has access to the API. Cue LHDN, whom the leaker attributes. That makes sense. So, someone, using myIDENTITY API, via script, through access (maybe) given to LHDN to access the data.

This leaves us with only 2 possible routes, assuming that LHDN was the source of query.

Possibility 1: Someone found a way to access the API through LHDN’s existing website. The website is the most likely user of the API, besides the internal systems. Maybe it’s the same system, at this point I have no clue of LHDN’s internal systems. Play along folks…

Possibility 2: A vendor whose maintain the system within LHDN who understands how the API works, decides to be curious and starts making API calls from one of LHDN system (maybe the API needs a specific IP address to allow access). He/She/They may have left it over the weekend, leaving the script to query endlessly. Come Monday, the vendor staff goes back to terminate the script, lo and behold, gigs of dataset. Bad year for the vendor (you know, pandemic and stuff), no bonus. Staff gets pissed and leaks data. (I’m just creating a story here, learning how to write stories so that my articles become more interesting).

So far, we’ve dissected the incident, there are nuggets of wisdom for blue team as well as developers to take note. Interestingly, I had such discussion with @Chan Wei Min about securing API on twtr. We’ll get to that at the bottom of this article. Good stuff always come last; otherwise how can I retain readership? (Be fair, don’t skip)

Let’s look at the news article.

LYN reports that a multi agency investigation, spearheaded by PDRM has begun. A demand of BTC 0.2 equivalent to RM35,495 has been sought as payment for the data. No news on whether someone bas purchased the data or not. LYN also confirms that police is not ruling out possibility of an insider threat in this situation (Possibility 2).

LHDN refutes the claim that it’s website is the source of the leak. LHDN confirms that it is only a user of my identity and does not own the platform. LHDN reveals that its own internal investigation showed that there was no leakage of information at its end.  LHDN insist that all the data and information under its custody is safe and protected by “recognized data security technology” (Sorry my brains may not be working today, but I have no clue in deciphering the double quote stuff, I’m thinking of ROT13, which is recognized in the industry). (Back to Possibility 1)

KDN, through its minister was a little kinder with information. TRP reports that YB mentioned that there are over 100 users (actual allowed users is 104)  of myIDENTITY, of which the leak could originated from any one of those, not just LHDN. YB also did not deny the validity of the data sold. He confirms in an article with MalaysiaNow that JPN insists there is no data leakage.

Technically there is no leakage at JPN itself, which makes the statement somewhat stretched to be to true. At this point, there are few unanswered questions.

  1. Is the data authentic and true? If yes, proceed to the next point.
  2. Was myIDENTITY the source of the data? (Seems like it, no one seems to be denying about myIDENTITY)
  3. LHDN mentions that their website isn’t the source of the leak. But there is still internal systems that use the data. So could it be that?

From logical deduction, it seems scenario 2 is most plausible, supported by the not ruling out of insider threat.

How to defend from these type of attacks

We notice that an API user is querying vast amount of data, most likely done over short period of time. Most of the time API’s are built for functionality, not security or analytics.

One way is to have a Web Application Firewall as a proxy to your API. This mitigates the usual types of attacks, but depending on the capability, it may provide some insight. Some WAF has capability of throttling or rate limiting requests, which makes it ideal for you to reduce (ab)use of your API.

I still believe that while you can “outsource” some of these security functions to WAF, the best place to discover and mitigate is still at the API itself.

Basic analytics needs to be available for you to know how your API is being used. What are the user trends of usage and when the API should be hit at all. Forget about public APIs, they’d be hit any time of the day. But API such as myIDENTITY? Unless if the govt agency runs batch jobs, high number of queries exceeding a certain threshold, and time of the query would give you insight into an impending incident. Of course, most professionals would recommend pumping your web logs to a SIEM for all those fancy bells and whistles.

Remember, to establish analytics for your API, you use the same tools as for your web server. It’s the same logs, but you may have additional logs generated by the API itself.

Additional considerations for your API deployment

  • Can you have rate limiting functions ? Pick any permutation, IP, session, client id, etc
  • How do you establish usage of your API, can you set threshold of queries?
  • How do you detect excessive queries, are there alarms or workflows triggered? (Besides server CPU going 100% or the Apache thread hung)

This matter was discovered post incident, which indicates that most likely such capabilities either (1) did not exist or (2) SOC folks fell asleep during their shift (poor overworked, underpaid SOC staffs) or (3) SIEM was under maintenance when the issue happened, so no visibility (maybe I should write about over reliance on technology in mitigating security issues).

Conclusion

I can’t conclude just yet, I’m sure there are more information waiting to be uncovered or released. I just wish (wish only) that the incident be documented and released publicly just like how the IHS breach in Singapore. But knowing Malaysia…

References

  1. [29 September 2021] MalaysiaNow – JPN confirms no data leakage (from KDN) – https://www.malaysianow.com/news/2021/09/29/home-ministry-confirms-no-leak-of-jpn-data/
  2. [28 September 2021] PDRM to investigate the JPN/LHDN data leakage – https://www.lowyat.net/2021/254222/pdrm-investigate-jpn-lhdn-db-leak/
  3. [29 September 2021] TheRakyatPost – No data leaks from JPN – https://www.therakyatpost.com/news/2021/09/29/no-data-leaks-in-jpn-dont-speculate-on-the-issue-claims-home-minister/