Some notes on the Monotonic Counter in Intel SGX and ME

SGX sealing is vulnerable to rollback attacks as the enclave is not able to tell if the sealed data is the latest or a old copy. To mitigate this attack, monotonic counter (MC) has been introduced in Intel SGX SDK 1.8. This post looks into some implementation details inside Intel SGX SDK.

1. sgx_create_monotonic_counter[1]

Look at the description of this API, two things show up – first, the counter creation involves non-volatile memory (NVM) writing; second, limited number of MCs can be created per enclave due to the quota. Apparently, unlike the older version of Intel SGX SDK, where MC was only available in the simulation mode, the MC here is based on real NVM, which also seems to have limited capacity. Let’s go deeper.

2. PSE (Platform Service/Security Enclave?)

The MC creation essentially calls sgx_create_monotonic_counter_ex[2] inside SDK. This function builds two pse_message_t messages for the request and reply, which are both passed to crypt_invoke. The crypt_invoke function establishes a PSE session (secure and authenticated channel) with aesmd, encrypts the request using the shared secret, and calls uae_invoke_service with both the request and reply passed, which eventually calls invoke_service_ocall[3] in the PSE provided by PSW other than the SDK.

3. OAL (OS Abstraction Layer?)

The PSE request and reply are then delivered to oal_invoke_service[4], which essentially transforms the PSE messages into AEInvokeServiceRequest and AEInvokeServiceResponse, and sends the AE messages to aesmd via UNIX socket ‘/var/run/aesmd/aesm.socket’. This MC creation request will then be handled by PSE_MC_SERVICE as a PSE_MC_CREATE command[1]

4. PSE_OP (PSE OPeration?)

The service handler table defined in pse_op.cpp[5] lists all the MC related command PSE_MC_SERVICE supports:

static const struct service_handler_t
{
    uint16_t  service_id;
    uint16_t  service_cmd;
    uint16_t  req_size;
    uint16_t  resp_size;
    srv_pfn_t srv_pfn;
} service_handler[] = {
    {PSE_MC_SERVICE, PSE_MC_CREATE, sizeof(pse_mc_create_req_t), sizeof(pse_mc_create_resp_t), pse_mc_create},
    {PSE_MC_SERVICE, PSE_MC_READ,   sizeof(pse_mc_read_req_t),   sizeof(pse_mc_read_resp_t),   pse_mc_read  },
    {PSE_MC_SERVICE, PSE_MC_INC,    sizeof(pse_mc_inc_req_t),    sizeof(pse_mc_inc_resp_t),    pse_mc_inc   },
    {PSE_MC_SERVICE, PSE_MC_DEL,    sizeof(pse_mc_del_req_t),    sizeof(pse_mc_del_resp_t),    pse_mc_del   },
    {PSE_TRUSTED_TIME_SERVICE, PSE_TIMER_READ,   sizeof(pse_timer_read_req_t), sizeof(pse_timer_read_resp_t), pse_read_timer},
};

And pse_mc_create is our target handler.

5. VMC (Virtual Monotonic Counter?)

Now we are into create_vmc[6], described as “create a VMC in SQLite Database and return UUID of the VMC to the caller”. Huh, so the MC is essentially an entry in a SQLite DB. So how could a SQLite DB defend against rollback attacks given that it is just a normal file on the disk? Hang on. Let’s continue. When this is the first MC creation request ever happend, we need to create the VMC DB first, via initialize_sqlite_database_file[7]. This very first thing it does is to call read_rpdata[8]. As its file name implies, the SQLite DB accesses hardware (NV) to get the hash of the DB at certain epoch, which is essentailly a HW MC. We are getting close to the real HW…

6. PSDA_Service[9]

This file contains all functions used by the VMC SQLite DB to create/delete/update HW MC in the so-far-unknown HW via invoke_psda_service. An ephemeral session is established in the function to provide secure and authenticated communication with the mystery HW, through psda_invoke_service_ocall[10], and eventually psda_invoke_service[11].

7. Intel ME

Just looking at the psda_invoke_service is really insightful:

    JVM_COMM_BUFFER commBuf;
    commBuf.TxBuf->buffer = psda_req_msg;
    commBuf.TxBuf->length = psda_req_msg_size;
    commBuf.RxBuf->buffer = psda_resp_msg;
    commBuf.RxBuf->length = psda_resp_msg_size;
    int response_code;
    ae_error_t ret;

    PROFILE_START("JHI_SendAndRecv2");
    ret = PSDAService::instance().send_and_recv(
                        PSDA_COMMAND_SERVICE,
                        &commBuf,
                        &response_code,
                        NO_RETRY_ON_SESSION_LOSS);
    PROFILE_END("JHI_SendAndRecv2");

JVM? JHI? Finally, we have reached the dark side of the MC. Somehow some Java shit involves in communicating with the secret HW[13] — Yes, it is Intel ME! JHI stands for Java Host Interface, which is used to install Java applets into Dynamic Application Loader (DAL) firmware (inside ME) and communicate with them[12]. Yes, the SGX MC is essentially implemented in ME as a Java applet called “PSDA”!

8. PSDA.dalp

PSDAService::install_psda[14] installs the psda applet into ME using JHI. Not surprisingly, the source code for psda is not available. Instead, the download_prebuilt.sh script is used to download the applet binary directly, since it needs to signed by Intel — Right! Intel does not want anyone to install any applet into ME, for security reasons!!! After running the download script, PSDA.dalp will appear under psw/ae/data/prebuilt. It is encoded as Base64:

image1

Copying out the Base64 encoded applet and running decoding, we have a chance to peek into the implementation of the applet:

com.intel.cryptocom.intel.langutilcom.intel.sgx.psdacom.intel.sgx.psda.commoncom.intel.sgx.psda.servicecom.intel.sgx.psda.service.base&com.intel.sgx.psda.service.entity.base)com.intel.sgx.psda.service.entity.command)com.intel.sgx.psda.service.entity.message)com.intel.sgx.psda.service.entity.storagecom.intel.sgx.psda.sessioncom.intel.sgx.psda.util
                                             com.intel.uicom.intel.util	java.langPlatformServiceDalAppletBytesArrayInputStreamBytesArrayOutputStream
                                  CodeWrapper	Constants#Constants$CSE_PLATFORM_CAPABILITIES!Constants$CSE_SERVICE_RESP_STATUS!Constants$EPHEMERAL_SESSION_STATEConstants$LT_SESSION_STATEConstants$MEDIAPATH_IDonstants$PSDA_COMMARPDataServiceServiceFactoryServiceMessageFactoryTrustedTimeServiceServiceHandlerPSDACommonMessage$PSDA_SERVICE_ID
                                                                                                 PSDAMessagePSDASerializableServiceCommandCSEUnknownReqResponsePRTCReadRequestPRTCReadResponse"ProtectedOutputSetStreamKeyRequest#ProtectedOutputSetStreamKeyResponseRPDataReadRequestRPDataResetRequestRPDataResponseRPDataUpdateRequestCapabilityRespMessageCertificateInfoRespMessageEphSessionMessageLTSessionMessagePlatformInfoReqMessageServiceMessagePSEPairingInfo
   SEFlashData
EphSessionEphSessionManager	LTSessionLTSessionManager
                                                         CalendarUtil
                                                                     DALApiProxy	DebugUtil	FlashUtilSGXMonotonicCounterProxyCryptoException	EpidAlgExHashAlgIllegalParameterExceptionNotSupportedExceptionOperationFailedExceptionRandom
SigmaAlgExSymmetricBlockCipherAlgSymmetricSignatureAlg
TypeConverterProtectedOutput
                            UiExceptioniIllegalParameterExceptionUiIllegalUseExceptioCalendar
DebugPrint
          FlashStorageFlashWearoutException
                                           IOException
UtilExceptionimeExceptionObjectString                 IntelAppletSEMonotonicCounter
StringBuilder   Throwable            StringBuffer
                         API_VERSION
                                    AUDIO_OUTPUT!CAP_PROTECTED_OUTPUT_KEY_EXCHANGE
CAP_RPDATACAP_TRUSTED_TIME
EPID_GID_SIZEERROR_CAP_NOT_AVAILABLEERROR_INTERNALERROR_INVALID_PARAM%ERROR_PERSISTENT_DATA_WRITE_THROTTLEDERROR_UNKNOWN_REQFLASH_STRUCT_VERSIONHMAC_SHA256_SIZEINSTANCE_ID_LEINT_SIZELT_SESSION_STATE_ESTABLISHEDLT_SESSION_STATE_IN_PROGRESSLT_SESSION_STATE_NOT_EXIST
                                    MAC_KEY_SIZEMAX_SESSION_COUNTMC_EPOCH_LENGTH
NONCE_SIZEPAYLOAD_IV_SIZEPAYLOAD_MAC_SIZEPRODUCT_TYPE_PSEPROTECTED_OUTPUT_STREAMKEY_SIZEPR_SIZEPSDA_BAD_PARAMETERPSDA_COMMAND_EPPSDA_COMMAND_LTPSDA_COMMAND_SERVICE PSDA_ILLEGAL_PARAMETER_EXCEPTIONPSDA_INTEGRITY_ERRORPSDA_INTERNAL_ERRORPSDA_INVALID_COMMANDPSDA_INVALID_SESSION_STATEPSDA_LT_PAIRING_NOT_EXISTPSDA_MSG_TYPE_CAP_QUERYPSDA_MSG_TYPE_CAP_RESULTPSDA_MSG_TYPE_CERT_INFO_QUERYPSDA_MSG_TYPE_CERT_INFO_RESULTPSDA_MSG_TYPE_EP_M1PSDA_MSG_TYPE_EP_M2PSDA_MSG_TYPE_EP_M3PSDA_MSG_TYPE_EP_M4PSDA_MSG_TYPE_LT_M1PSDA_MSG_TYPE_LT_M2PSDA_MSG_TYPE_LT_M3PSDA_MSG_TYPE_LT_M4PSDA_MSG_TYPE_SERV_REQPSDA_MSG_TYPE_SERV_RESPPSDA_NOT_PROVISIONEDPSDA_NOT_SUPPORTED_EXCEPTION$PSDA_PERSISTENT_DATA_WRITE_THROTTLEDPSDA_PLATFORM_KEYS_REVOKEDSDA_PROTOCOL_NOT_SUPPORTEDPSDA_SEQNO_CHECK_FAIL
                                                                                   PSDA_SUCCESSPSDA_UNKNOWN_REQEUSTPSDA_UTIL_EXCEPTIONPSE_PROTECTED_OUTPUT_SERVICE PSE_PROTECTED_OUTPUT_SETSTEAMKEYPSE_RPDATA_READPSE_RPDATA_RESETPSE_RPDATA_SERVICEPSE_RPDATA_UPDATEPSE_TIMER_READPSE_TRUSED_TIME_SERVICEPS_COMMAND_INFO
                                                                                     RPDATA_SIZESESSION_ACTIVESESSION_CLOSEDSESSION_IN_PROGRESSSESSION_KEY_SIZESGX_MONOTONIC_COUNTER_TYPE_RPMBSGX_MONOTONIC_COUNTER_TYPE_RPMCSUCCESSTIMER_EPOCH_LENGTHUNIX_TIMESTAMP_2015_01_01
                                           VIDEO_OUTPUTbuffercapDescriptorVersiocertInfocodecount
                                                                                                 currentTime
                                                                                                            enableDebug
encryptKeyencryptedPayload
                          eph_sessionindeinstance
lt_sessionlt_session_state_maskm_bIsFreshPairingm_bWritePending
m_calendarm_idCsem_idPsm_macKeym_mk
m_nonceCsem_pairing_infm_seqNum
                               m_sessionKeym_sessionStatem_sigmaInstancem_sk
                                                                            mediaPathIdmkmsgBodymsgLenmsgTypemsgTypeExpRespSize
             pairing_info
pairing_sk	payloadIv
payloadMac
          payloadSize
prtc_epochpse_instance_ireservedrpEpochrp_datrp_epochrpdata
rpdata_cur
rpdata_newseqNumserialVersionUIDserviceCapDescriptor0serviceCapDescriptor1
set_time_infosgxMonotonicCounterskstatussw_instance_id
timerEpochversionwrappedEncryptKeabandonSessionclearcreateServiceCommandcreateServiceMessage	deriveIDseraseAllFlashData
                        expandBufferfinalizeEphemeralSessionfinalizeLTSessionfindPairingInfogetCurrentEpidGidgetCurrentTimegetFlashDataSize
invokeServiceisFreshPairingisSessionEstablishedonCloseonInitonInternalCommandonInternalOpenHandler
                                                                                           printBuffeprintInt
                                                                                                             printreadFlashDatareadInreadLongonprocessLTSessioreadByte	readBytes
readObjectreadPairingCounter	readShortresetresponseWithMessageseekset!setFlashStoragePropertyAntiReplay
                                                                                                          setResponse	setSeqNumsetTimesigmaSetPropertyAppendSVNsizestartEphemeralSessionstartLongTermSession
                                                                                              toByteArratoStringupdatePairingInfo
               updateRPData	writeByte
writeByteswriteFlashData_0writeFlashData_writeInt	writeLong
                                                                 writeObject
writeShortappend
bytesToInt
copyByteArraycreateortcocreateAlgcreateInstancedecryptCompletedeletedisposeenableHighUpdateFrequencyFlashencryptCofillByteArraygetEncryptedKeyRecord
getMacKeySize
getMessagegetRandomBytes
                        getS1MessagegetS1MessageLengthgetS3MessageLength
                                                                        getSecretKeygetSecretKeySizegetSessionParametergetTime
initialize
isProvisionedlengthprocessCompleteprocessS2MessagesetIVsetKey
                                                             setPropertysetResponseCode
                                                                                       shortToBytes
                                                                                                   signCompleteverifyCompletewriteFlashData*+***-*D*)*;j)*L**)**Sd*+*(***+***%*%*&*&d*)*d**d**ddd*N*Sd*+**d*****dd**S*)dd*!**U*U*R*U*S*T*S*Uddddd*6*<*=*5*;dd*Tdd*L*O*Dd*@ddddddd

9. Intel ME Again

Although we only looked at the MC creation in SGX, this post has been long enough. Last thing about SGX MC is still about Intel ME. If you have seen dmesg like “Device doesn’t have valid ME Interface”, and neither /dev/mei nor /dev/mei0 is created, it is very likely that you are running a server platform with a different ME firmware (SPS) comparing to desktops/laptops[15], and the corresponding DAL firmware is not available in SPS. This means SGX MC is NOT available for servers!!! Well, I guess this does not really matter as people are in a rush to disable ME anyway… On the other hand, implementing SGX MC using ME seems to be the most reasonable choice (at least, we are still inside the CPU package), although this makes “SGX defends against ME attacks” statement in vain.

References:

[1] https://software.intel.com/en-us/node/709160
[2] https://github.com/01org/linux-sgx/blob/master/sdk/tae_service/tae_service.cpp
[3] https://github.com/01org/linux-sgx/blob/master/psw/uae_service/sgx_uae_service.cpp
[4] https://github.com/01org/linux-sgx/blob/master/psw/uae_service/uae_wrapper/src/tae_ocall_api.cpp
[5] https://github.com/01org/linux-sgx/blob/master/psw/ae/pse/pse_op/pse_op.cpp
[6] https://github.com/01org/linux-sgx/blob/master/psw/ae/pse/pse_op/monotonic_counter_database_sqlite_rpdb.cpp
[7] https://github.com/01org/linux-sgx/blob/master/psw/ae/pse/pse_op/monotonic_counter_database_sqlite_bin_hash_tree_utility.cpp
[8] https://github.com/01org/linux-sgx/blob/master/psw/ae/pse/pse_op/monotonic_counter_database_sqlite_access_hw_mc.cpp
[9] https://github.com/01org/linux-sgx/blob/master/psw/ae/pse/pse_op/psda_service.cpp
[10] https://github.com/01org/linux-sgx/blob/master/psw/ae/aesm_service/source/pse_op/pse_op_psda_ocall.cpp
[11] https://github.com/01org/linux-sgx/blob/master/psw/ae/aesm_service/source/pse_op/PSEClass.cpp
[12] https://github.com/intel/dynamic-application-loader-host-interface
[13] https://www.slideshare.net/codeblue_jp/igor-skochinsky-enpub
[14] https://github.com/01org/linux-sgx/blob/master/psw/ae/aesm_service/source/pse_op/PSDAService.cpp
[15] https://software.intel.com/en-us/forums/intel-business-client-software-development/topic/696357

Posted in Security | Tagged , , , | Leave a comment

Why list and charlist are confusing in Elixir

This post talks about list and charlist in Elixir, and discusses one specific issue in Elixir.

Let’s look at the confusing thing in Elixir:

iex(2)> a = [7]
‘\a’
iex(3)> b = ‘\a’
‘\a’
iex(4)> a == b
true
iex(5)> is_list(a)
true
iex(6)> is_list(b)
true

First, a list is shown as a charlist in iex; second, (without surprise if we believe in iex), the charlist equals the list; last, both the list and the charlist are lists.

Why? Because charlist in represented as a list with code point(s) inside, and those encoding scheme is UTF-8 (rather than UTF-16 in Erlang). To convince yourself:

iex(7)> Enum.map(0..255, fn x -> [x] end)
[[0], [1], [2], [3], [4], [5], [6], ‘\a’, ‘\b’, ‘\t’, ‘\n’, ‘\v’, ‘\f’, ‘\r’,
[14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26],
‘\e’, [28], [29], [30], [31], ‘ ‘, ‘!’, ‘”‘, ‘#’, ‘$’, ‘%’, ‘&’, ‘\”, ‘(‘,
‘)’, ‘*’, ‘+’, ‘,’, ‘-‘, ‘.’, ‘/’, ‘0’, ‘1’, …]

Looks familiar? Yes, those are ASCII codes. So the conclusion seems to be that list and charlist are the same in Elixir. Well, this is mostly right, as we have seen, but not exactly right. Run the file below to inspect into list and charlist internal using pry:

require IEx

value_list = [7]
value_list2 = [1]
value_char = '\a'

IEx.pry

daveti@Daves-MacBook-Pro:~/elixir/list_issue$ iex list_issue.exs
Erlang/OTP 20 [erts-9.0] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Request to pry #PID<0.83.0> at list_issue.exs:7

5: value_char = ‘\a’
6:
7: IEx.pry

Allow? [Yn]
Interactive Elixir (1.5.1) – press Ctrl+C to exit (type h() ENTER for help)
pry(1)> i value_list
Term
‘\a’
Data type
List
Description
This is a list of integers that is printed as a sequence of characters
delimited by single quotes because all the integers in it represent valid
ASCII characters. Conventionally, such lists of integers are referred to as
“charlists” (more precisely, a charlist is a list of Unicode codepoints,
and ASCII is a subset of Unicode).
Raw representation
[7]
Reference modules
List
Implemented protocols
IEx.Info, Collectable, Enumerable, Inspect, List.Chars, String.Chars
pry(2)>
nil
pry(3)> i value_list2
Term
[1]
Data type
List
Reference modules
List
Implemented protocols
IEx.Info, Collectable, Enumerable, Inspect, List.Chars, String.Chars
pry(4)>
nil
pry(5)> i value_char
Term
‘\a’
Data type
List
Description
This is a list of integers that is printed as a sequence of characters
delimited by single quotes because all the integers in it represent valid
ASCII characters. Conventionally, such lists of integers are referred to as
“charlists” (more precisely, a charlist is a list of Unicode codepoints,
and ASCII is a subset of Unicode).
Raw representation
[7]
Reference modules
List
Implemented protocols
IEx.Info, Collectable, Enumerable, Inspect, List.Chars, String.Chars
pry(6)>

Now you see the difference between list and charlist. Both have data type list, which explains why they are equivalent. Moreover, when the numbers inside the list could be interpreted as valid ASCII code points, Elixir shows it as a charlist rather than a list, although the raw presentation is always a list.

Why this is confusing? Let’s say you wanna print out a list, e.g., [7], but what Elixir displays is ‘\a’. Although it does not affect the internal computation, e.g., map, this display issue is really annoying and confusing. In this case, Erlang seems doing sane:

Eshell V9.0 (abort with ^G)
1> [7].
[7]
2> ‘\a’.
a
3> [7] == ‘\a’.
false

Posted in Programming | Tagged , , , , | Leave a comment

A PoC of DoS attack in Elixir Actor Model

The naive way of using the Actor model in Elixir is using “receive” in a loop, which is then “spawn”d as a Erlang process. Unfortunately, a potential DoS attack could happen if the pattern matching is not coded carefully with security in mind. This post looks into this potential vulnerability, and provides a PoC implementation.

A typical “receive” block in Elixir provides different pattern matchings, like below:

receive do
      {sender_pid, {:msg, _msg}} ->
        send(sender_pid, {:ok, "gotcha"})
      _ ->
        IO.puts "WTF"
    end

Following the same order of matchings, the message goes through each matching one by one till finding a match. Otherwise, “_” (matching all) catchs whatever received. While this defensive matching all seems optional, it is vital to the security of the Erlang VM. As turns out, the message would stay in the mailbox forever, if no matching is able to “catch” the message. The potential result is DoS attack, caused by OOM (out of memory) of the mailbox.

To observe this potential DoS attack, please download the PoC (https://github.com/daveti/edos). As shown below, unmatched messages keep accumulating in the mailbox (message queue) when running edos:

msg 24645 sent
worker msg queue len {:message_queue_len, 24645}
msg 24646 sent
worker msg queue len {:message_queue_len, 24646}
msg 24647 sent
worker msg queue len {:message_queue_len, 24647}
msg 24648 sent
^C

“top” also shows the memory usage (of beam.smp) keep increasing as more messages are sent to the mailbox. This DoS could turn into DDoS easily in a distributed environment.

As usual, Elixir compiler does not complain about this during the compilation time, although it would be really useful for the compiler to give some warnings. Nothing is gonna happen in runtime as well, except the memory consumption of the mailbox, which is “invisible” to the user. Even worse, dialyxir also reports nothing regarding this potential issue.

One more thing needs to verify is whether a GenServer implementation without handle_info/2 would also cause DoS in the mailbox. I suspect it the same as above.

Posted in Programming, Security, Static Code Analysis | Tagged , , , , | Leave a comment

A pitfall of GenServer programming in Elixir

OTP as a programming platform/framework, empowers not only Erlang, but also Elixir. This post looks into a pitfall when programming GenServer in Elixir, and provides a potential solution.

All callbacks in the GenServer behavior have a limited and pre-defined possible return values. Presumbly, one should pick up the desired return value from valid ones. One example for handle_call/3 looks like below:

{:reply, reply, new_state}

But what if we screw the return value, e.g., by mistakes like below,

{:reply, reply, :ok, new_state}

where atom :ok is added as an extra parameter of the return value. One would hope the Elixir compiler would catch this simple mistake. Nevertheless, neither mix compile or mix escript.build would report this bug. And yes, this bug will be triggered when handle_call/3 is called with a “Bad Argument” exception thrown out.

While “let it crash” and “unit testing” may be good reasons for Elixir compiler to treat this bug as a runtime one rather than a compiling one, this also means this simple bug won’t be discovered util deployment if unit testing is not covered properly. And once it happens, it keeps crashing “thanks” to the fault-tolerant feature using Supervisor…

Without changes in the Elixir compiler, it is still possible to detect this bug, using dialyxir (yes, Elixir version of dialyzer for sure). “mix dialyxir” shows the detection of this bug:

done in 0m1.52s
lib/dig.ex:9: The inferred return type of handle_call/3 ({‘reply’,{‘ok’,{_,_,_,_,_}},’ok’,#{‘__struct__’:=’Elixir.Digdb’, ‘node’:=_, ‘prefix’:=_, ‘remote’:=_, ‘unit_base’:=_, ‘unit_range’:=_, ‘worker_idx’:=_, ‘zero_num’:=_}}) has nothing in common with {‘noreply’,_} | {‘noreply’,_,’hibernate’ | ‘infinity’ | non_neg_integer()} | {‘reply’,_,_} | {‘stop’,_,_} | {‘reply’,_,_,’hibernate’ | ‘infinity’ | non_neg_integer()} | {‘stop’,_,_,_}, which is the expected return type for the callback of the ‘Elixir.GenServer’ behaviour
done (warnings were emitted)

Posted in Programming, Static Code Analysis | Tagged , , , , | Leave a comment

Pitfalls in negative indexing in C

Negative indexing in C, such as a[-1], is legit, although rarely used. There are reasons (pitfalls) why negative indexing is not recommended. This post discusses these pitfalls when using negative indexing (for fun).

1. Negative indexing within the bound

#include <stdio.h>
#include <string.h>

int main()
{
	int a[10];
	int *p;
	int i;

	for (i = 0; i < 10; i++)
		a[i] = 0x12345678;
	p = &a[9];

	for (i = 0; i < 10; i++) {
		printf("a[%d]=0x%x, %p\n", i, a[i], &a[i]);
		printf("p[-%d]=0x%x, %p\n", i, p[-i], &p[-i]);
	}

	return 0;
}

In the code above, we let pointer p pointing to the end of array a, and print out each element in a using negative indexing with the help of p.

[daveti@daveti c]$ ./negidx0
a[0]=0x12345678, 0x7ffd50edad40
p[-0]=0x12345678, 0x7ffd50edad64
a[1]=0x12345678, 0x7ffd50edad44
p[-1]=0x12345678, 0x7ffd50edad60
a[2]=0x12345678, 0x7ffd50edad48
p[-2]=0x12345678, 0x7ffd50edad5c
a[3]=0x12345678, 0x7ffd50edad4c
p[-3]=0x12345678, 0x7ffd50edad58
a[4]=0x12345678, 0x7ffd50edad50
p[-4]=0x12345678, 0x7ffd50edad54
a[5]=0x12345678, 0x7ffd50edad54
p[-5]=0x12345678, 0x7ffd50edad50
a[6]=0x12345678, 0x7ffd50edad58
p[-6]=0x12345678, 0x7ffd50edad4c
a[7]=0x12345678, 0x7ffd50edad5c
p[-7]=0x12345678, 0x7ffd50edad48
a[8]=0x12345678, 0x7ffd50edad60
p[-8]=0x12345678, 0x7ffd50edad44
a[9]=0x12345678, 0x7ffd50edad64
p[-9]=0x12345678, 0x7ffd50edad40

The output proves what we would expect. Subcriptor [] simply means that E1[E2]=*((E1)+(E2)). So it does not matter whether E2 is positive or negative, and the compiler will take care of pointer type to generate the correct the result pointer (say, whether it is int or char). [1]

2. Negative indexing out of the bound

Things get complicated when we try to reference var a from var b, assuming that both a and b are next to each other…

#include <stdio.h>
#include <string.h>

int main()
{
	char a[10];
	char b[10];

	memset(a, 0x0, 10);
	memset(b, 0x1, 10);

	int i;
	printf("a=%p, b=%p\n", a, b);
	for (i = 0; i < 10; i++) {
		printf("b[%d]=%d, %p\n", i, b[i], &b[i]);
		printf("b[-%d]=%d, %p\n", i, b[-i], &b[-i]);
	}

	return 0;
}

If you, like myself, expect to see some 0/1 interleaved, then the output would let you down:

[daveti@daveti c]$ ./negidx1
a=0x7ffc4c88ecd2, b=0x7ffc4c88ecc8
b[0]=1, 0x7ffc4c88ecc8
b[-0]=1, 0x7ffc4c88ecc8
b[1]=1, 0x7ffc4c88ecc9
b[-1]=0, 0x7ffc4c88ecc7
b[2]=1, 0x7ffc4c88ecca
b[-2]=0, 0x7ffc4c88ecc6
b[3]=1, 0x7ffc4c88eccb
b[-3]=0, 0x7ffc4c88ecc5
b[4]=1, 0x7ffc4c88eccc
b[-4]=0, 0x7ffc4c88ecc4
b[5]=1, 0x7ffc4c88eccd
b[-5]=0, 0x7ffc4c88ecc3
b[6]=1, 0x7ffc4c88ecce
b[-6]=64, 0x7ffc4c88ecc2
b[7]=1, 0x7ffc4c88eccf
b[-7]=6, 0x7ffc4c88ecc1
b[8]=1, 0x7ffc4c88ecd0
b[-8]=16, 0x7ffc4c88ecc0
b[9]=1, 0x7ffc4c88ecd1
b[-9]=0, 0x7ffc4c88ecbf

With all the addresses printed out, I guess you have figured sth out. The pitfall here is the actual memory layout of a and b – since stack grows downwards (from high to low), a has higher starting address than b. That is being said, b[-x] would never reach a, which can be seen from the addresses printed out. Therefore, if we wanna reference a from b, we need positive indexing out of bound, e.g.,

#include <stdio.h>
#include <string.h>

int main()
{
	char a[10];
	char b[10];

	memset(a, 0x0, 10);
	memset(b, 0x1, 10);

	int i;
	printf("a=%p, b=%p\n", a, b);
	for (i = 0; i < 20; i++) {
		printf("b[%d]=%d, %p\n", i, b[i], &b[i]);
	}

	return 0;
}

And here is the output:

[daveti@daveti c]$ ./negidx2
a=0x7ffdbf9bc682, b=0x7ffdbf9bc678
b[0]=1, 0x7ffdbf9bc678
b[1]=1, 0x7ffdbf9bc679
b[2]=1, 0x7ffdbf9bc67a
b[3]=1, 0x7ffdbf9bc67b
b[4]=1, 0x7ffdbf9bc67c
b[5]=1, 0x7ffdbf9bc67d
b[6]=1, 0x7ffdbf9bc67e
b[7]=1, 0x7ffdbf9bc67f
b[8]=1, 0x7ffdbf9bc680
b[9]=1, 0x7ffdbf9bc681
b[10]=0, 0x7ffdbf9bc682
b[11]=0, 0x7ffdbf9bc683
b[12]=0, 0x7ffdbf9bc684
b[13]=0, 0x7ffdbf9bc685
b[14]=0, 0x7ffdbf9bc686
b[15]=0, 0x7ffdbf9bc687
b[16]=0, 0x7ffdbf9bc688
b[17]=0, 0x7ffdbf9bc689
b[18]=0, 0x7ffdbf9bc68a
b[19]=0, 0x7ffdbf9bc68b

Or, if we wanna reference b from a, we could use negative indexing out of bound, e.g.,

#include <stdio.h>
#include <string.h>

int main()
{
	char a[10];
	char b[10];

	memset(a, 0x0, 10);
	memset(b, 0x1, 10);

	int i;
	printf("a=%p, b=%p\n", a, b);
	for (i = 0; i < 10; i++) {
		printf("a[%d]=%d, %p\n", i, a[i], &a[i]);
		printf("a[-%d]=%d, %p\n", i, a[-i], &a[-i]);
	}

	return 0;
}

And here is the output:

[daveti@daveti c]$ ./negidx3
a=0x7ffece838602, b=0x7ffece8385f8
a[0]=0, 0x7ffece838602
a[-0]=0, 0x7ffece838602
a[1]=0, 0x7ffece838603
a[-1]=1, 0x7ffece838601
a[2]=0, 0x7ffece838604
a[-2]=1, 0x7ffece838600
a[3]=0, 0x7ffece838605
a[-3]=1, 0x7ffece8385ff
a[4]=0, 0x7ffece838606
a[-4]=1, 0x7ffece8385fe
a[5]=0, 0x7ffece838607
a[-5]=1, 0x7ffece8385fd
a[6]=0, 0x7ffece838608
a[-6]=1, 0x7ffece8385fc
a[7]=0, 0x7ffece838609
a[-7]=1, 0x7ffece8385fb
a[8]=0, 0x7ffece83860a
a[-8]=1, 0x7ffece8385fa
a[9]=0, 0x7ffece83860b
a[-9]=1, 0x7ffece8385f9

3. Alignment

Things get more complicated when there is memory alignment in the stack, which is pretty common.

#include <stdio.h>
#include <string.h>

typedef struct _total_made_up {
	char c0;
	int i;
	char c1[10];
} tmu;

int main()
{
	tmu a;
	tmu b;

	memset(&a, 0x0, sizeof(tmu));
	memset(&b, 0x1, sizeof(tmu));

	int i;
	printf("a=%p, a.c0=%p, a.i=%p, a.c1=%p, sizeof(a)=%d\n",
		&a, &(a.c0), &(a.i), &(a.c1), sizeof(a));
	printf("b=%p, b.c0=%p, b.i=%p, b.c1=%p, sizeof(b)=%d\n",
		&b, &(b.c0), &(b.i), &(b.c1), sizeof(b));
	for (i = 0; i < 10; i++) {
		printf("a.c1[%d]=%d, %p\n", i, (a.c1)[i], &(a.c1)[i]);
		printf("a.c1[-%d]=%d, %p\n", i, (a.c1)[-i], &(a.c1)[-i]);
	}

	return 0;
}

Here is the output:

[daveti@daveti c]$ ./negidx4
a=0x7ffc4bf3c8e0, a.c0=0x7ffc4bf3c8e0, a.i=0x7ffc4bf3c8e4, a.c1=0x7ffc4bf3c8e8, sizeof(a)=20
b=0x7ffc4bf3c8c0, b.c0=0x7ffc4bf3c8c0, b.i=0x7ffc4bf3c8c4, b.c1=0x7ffc4bf3c8c8, sizeof(b)=20
a.c1[0]=0, 0x7ffc4bf3c8e8
a.c1[-0]=0, 0x7ffc4bf3c8e8
a.c1[1]=0, 0x7ffc4bf3c8e9
a.c1[-1]=0, 0x7ffc4bf3c8e7
a.c1[2]=0, 0x7ffc4bf3c8ea
a.c1[-2]=0, 0x7ffc4bf3c8e6
a.c1[3]=0, 0x7ffc4bf3c8eb
a.c1[-3]=0, 0x7ffc4bf3c8e5
a.c1[4]=0, 0x7ffc4bf3c8ec
a.c1[-4]=0, 0x7ffc4bf3c8e4
a.c1[5]=0, 0x7ffc4bf3c8ed
a.c1[-5]=0, 0x7ffc4bf3c8e3
a.c1[6]=0, 0x7ffc4bf3c8ee
a.c1[-6]=0, 0x7ffc4bf3c8e2
a.c1[7]=0, 0x7ffc4bf3c8ef
a.c1[-7]=0, 0x7ffc4bf3c8e1
a.c1[8]=0, 0x7ffc4bf3c8f0
a.c1[-8]=0, 0x7ffc4bf3c8e0
a.c1[9]=0, 0x7ffc4bf3c8f1
a.c1[-9]=0, 0x7ffc4bf3c8df

If we compare the starting address of a and b, we can find a 32-byte difference. The size of the struct is 20 bytes, which means there is a 12-byte gap between the end of b and the start of a due to padding (alignment). Even worse, there is also padding happened within the structure, where the c0 is padded with 3 bytes to alignment with the 4-byte int (i), and c1 is padded with another 2 bytes in the end. Therefore, if we want to use a.c1 to reference b.c1 with negative indexing, the index should start from -22 (8 bytes in a, and 12 bytes in the gap, and 2 bytes padding in b.c1). You don’t see this number coming, do you:)

#include <stdio.h>
#include <string.h>

typedef struct _total_made_up {
	char c0;
	int i;
	char c1[10];
} tmu;

int main()
{
	tmu a;
	tmu b;

	memset(&a, 0x0, sizeof(tmu));
	memset(&b, 0x0, sizeof(tmu));
	b.c0 = 1;
	b.i = 1;
	memset(&b.c1, 0x1, 10);

	int i;
	printf("a=%p, a.c0=%p, a.i=%p, a.c1=%p, sizeof(a)=%d\n",
		&a, &(a.c0), &(a.i), &(a.c1), sizeof(a));
	printf("b=%p, b.c0=%p, b.i=%p, b.c1=%p, sizeof(b)=%d\n",
		&b, &(b.c0), &(b.i), &(b.c1), sizeof(b));
	for (i = 0; i < 10; i++) {
		printf("a.c1[%d]=%d, %p\n", i, (a.c1)[i], &(a.c1)[i]);
		printf("a.c1[-%d-22]=%d, %p\n", i, (a.c1)[-i-22], &(a.c1)[-i-22]);
	}

	return 0;
}

[daveti@daveti c]$ ./negidx5
a=0x7ffce988ccf0, a.c0=0x7ffce988ccf0, a.i=0x7ffce988ccf4, a.c1=0x7ffce988ccf8, sizeof(a)=20
b=0x7ffce988ccd0, b.c0=0x7ffce988ccd0, b.i=0x7ffce988ccd4, b.c1=0x7ffce988ccd8, sizeof(b)=20
a.c1[0]=0, 0x7ffce988ccf8
a.c1[-0-22]=0, 0x7ffce988cce2
a.c1[1]=0, 0x7ffce988ccf9
a.c1[-1-22]=1, 0x7ffce988cce1
a.c1[2]=0, 0x7ffce988ccfa
a.c1[-2-22]=1, 0x7ffce988cce0
a.c1[3]=0, 0x7ffce988ccfb
a.c1[-3-22]=1, 0x7ffce988ccdf
a.c1[4]=0, 0x7ffce988ccfc
a.c1[-4-22]=1, 0x7ffce988ccde
a.c1[5]=0, 0x7ffce988ccfd
a.c1[-5-22]=1, 0x7ffce988ccdd
a.c1[6]=0, 0x7ffce988ccfe
a.c1[-6-22]=1, 0x7ffce988ccdc
a.c1[7]=0, 0x7ffce988ccff
a.c1[-7-22]=1, 0x7ffce988ccdb
a.c1[8]=0, 0x7ffce988cd00
a.c1[-8-22]=1, 0x7ffce988ccda
a.c1[9]=0, 0x7ffce988cd01
a.c1[-9-22]=1, 0x7ffce988ccd9

3. Summary

We have just talked about how messed up negative indexing could be on the stack. It is also possible to do this on the heap, which is more error-prone given the memory management under the hood (malloc/kmalloc). To correctly use negative (or out of bound) indexing, one needs to know exactly what the memory layout looks like, this could be hard for stack variables, even harder for heap ones. For normal programming, in most cases, you do not need negative indexing. As a matter of fact, the only normal usage I have seen so far is from ARMv3-M software development manual, where negative indexing is used to retrieve the SVC instruction number, which is saved on the stack by the CPU core during the supervisor call.

Reference:

[1] https://stackoverflow.com/questions/3473675/are-negative-array-indexes-allowed-in-c

Posted in Programming | Tagged , | Leave a comment

USB gadget functionalities in Android

I started working on Android stuffs this summer. While I mainly work on the USB layer within the Linux kernel, I do sometimes need to look into the Android framework, to see if I could achieve my goal from the Android user space. One of the questions I have got is to find the USB configuration of the phone (e.g., MTP, adb, and etc) when connected with the host machine. And here comes this post.

1. lsusb

The most straightforward way to have the USB information of the phone is to plug it into a Linux box, and run ‘lsusb -t‘ or ‘lsusb -v‘. Things get annoying when I tried to pull similar information from the phone directly.

2. adb

If you have adb with root permission, go to ‘/sys/class/android_usb/android0‘. You will find all the kernel USB gadget functionalities builtin, and the configurations, e.g., which functionalities are enabled in the phone right now. This won’t work if adb runs as ‘shell’ user.

3. USB Options

When the phone is connected with the host machine, the USB option menu would pop up providing more options: charging, MTP, PTP, and etc. This should be the easiest way in most cases, however, would fail if you are not playing with your own phone (e.g., pin protected, and yes, I am talking about “remote” phones in the device farm). It is also possible that some USB gadget functionalities are NOT exposed to the menu at all, e.g., a hiddent USB CDC/ACM functionality.

4. USB Settings

Some vendors provide secret dial codes to trigger a USB Setting menu with some combinations of different gadget functionalities for testing/debugging purpose. E.g., for Samsung Galaxy phones, use ‘*#0808#‘. Let me know if you have similar dial codes from other vendors.

5. UsbManager

Now it is time to look into the Android framework. Can we write an app to show the USB configuration/information? UsbManager.java[1] is the implementation in Android to control the USB hardware. To differentiate the USB host mode and the USB gadget mode, the Android APIs have USB Host APIs and USB Accessory API[2], both of which are managed by the UsbManager. For instance, getDeviceList() in the UsbManager returns a list of UsbDevice connected with the phone. As expected, UsbDevice class is the instantiation of a USB device defined by the USB spec. We could go and find the configuration, interfaces, and endpoints, just like what we could do with libusb. Accordingly, getAccessoryList() returns a list of UsbAccessory the phone behaves. Here is the problem – UsbAccessory is NOT abstracted as UsbDevice! The only information from UsbAccessory you could get is device information, such as manufacturer, product, serial number, and etc. In short, no interface-level information in the UsbAccessory!

Nevertheless, UsbManager may still be the answer if we look the latest implementation[3]. ‘isFunctionEnabled()’ and ‘containsFunction()’ seem to be promising to detect hiddent functionalities, which may not be shown in the USB option menu. Unfortunately, this still depends on the vendor-specific customization – whether or not the USB function is exposed to the Android framework (platform), and managed by the UsbManager.

Refs:
[1] https://developer.android.com/reference/android/hardware/usb/UsbManager.html
[2] https://developer.android.com/guide/topics/connectivity/usb/index.html
[3] https://github.com/android/platform_frameworks_base/blob/master/core/java/android/hardware/usb/UsbManager.java

Posted in OS | Tagged , , | Leave a comment

Understanding kcov – play with -fsanitize-coverage=trace-pc from the user space

kcov is a kernel feature used to support syzkaller[1]. To provide the code coverage information from the kernel itself, the GCC compiler was patched to instrument the kernel image[2]. The kernel itself was also patched to enable this feature where is propriate[3]. This post tries to reproduce the essense of kcov in the user space, in the hope of a better understanding of kcov in general.

1. -fsanitize-coverage=trace-pc

After (>=) 6.0, GCC supports this new feature/flag, which instruments every basic block generated by GCC with function “trace-pc”. This “trace-pc” function is provided by the user, and should have the name as “__sanitizer_cov_trace_pc“. For kcov, this function is implemented in kernel/kcov.c. By the time of writing, there is no user-space example available using this new flag, primarily because this feature is mainly designed for kernel/syzkaller, and gcov is available for user-space programs already. Nevertheless, we will show how to play this in the user space:)

2. First try

Let’s create a “non-trivial” testing file and a makefile, as shown below:

#include <stdio.h>
#include <string.h>

void __sanitizer_cov_trace_pc(void)
{
	printf("code instrumented...\n");
}

static int fun2(void)
{
	printf("fun2\n");
	return 0;
}

static void fun1(int a)
{
	fun2();
	printf("fun1: a[%d]\n", a);
}

int main(void)
{
	int a = 10;
	fun1(a);
	return 0;
}
CC=gcc
CFLAGS=-g -ggdb -fsanitize-coverage=trace-pc
LIBS=
DEPS=
OBJ=test.o

%.o: %.c $(DEPS)
	$(CC) -c -o $@ $< $(CFLAGS) $(LIBS)

test : $(OBJ)
	$(CC) -o $@ $^ $(CFLAGS) $(LIBS)

.PHONY: clean

clean:
	rm -f *.o test

Function “__sanitizer_cov_trace_pc” is defined in test.c file. The Makefile also enables the “-fsanitize-coverage” flag. We compile it, run it, and as you expected, get coredump. What the heck? Disassembling the binary gives us some hints:

0000000000400546 <__sanitizer_cov_trace_pc>::
  400546:	55                   	push   %rbp
  400547:	48 89 e5             	mov    %rsp,%rbp
  40054a:	e8 f7 ff ff ff       	callq  400546 <__sanitizer_cov_trace_pc>
  40054f:	bf 80 06 40 00       	mov    $0x400680,%edi
  400554:	e8 d7 fe ff ff       	callq  400430 <puts@plt>
  400559:	90                   	nop
  40055a:	5d                   	pop    %rbp
  40055b:	c3                   	retq

Besides main, fun1, fun2 are instrumented by __sanitizer_cov_trace_pc, __sanitizer_cov_trace_pc itself is also instrumented by the compiler, which creates a recursive bomb, and stack overflow eventually. So, we need to tell GCC not to instrument the instrumenting function itself.

3. Second try

If we look at the kernel patch[3] again, the __sanitizer_cov_trace_pc is defined with “notrace” decorator, which is essentially an attribute to let GCC skip instrumentation for the decorated function. All we need to do is to update our __sanitizer_cov_trace_pc function in test.c with this decorator:

#define notrace __attribute__((no_instrument_function))

void notrace __sanitizer_cov_trace_pc(void)
{
	printf("code instrumented...\n");
}

Compile, run, and unfortunately coredump again. Disassembly shows the recursive bomb is still there. WTF!

4. Do it again

If we look at the old makefile, we have __sanitizer_cov_trace_pc defined inside the test.c and shared the same compilation flags as other functions, including “-fsanitize-coverage”. It is time to split the function into a separated file and use a different set of compilation flags. Now we have trace.c, trace.h, test.c, and a new makefile, as shown below:

#include <stdio.h>

#define notrace __attribute__((no_instrument_function))

void notrace __sanitizer_cov_trace_pc(void)
{
	printf("code instrumented...\n");
}
#define notrace __attribute__((no_instrument_function))
void notrace __sanitizer_cov_trace_pc(void);
#include <stdio.h>
#include <string.h>
#include "trace.h"

static int fun2(void)
{
	printf("fun2\n");
	return 0;
}

static void fun1(int a)
{
	fun2();
	printf("fun1: a[%d]\n", a);
}

int main(void)
{
	int a = 10;
	fun1(a);
	return 0;
}
CC=gcc
CFLAGS=-g -ggdb -fsanitize-coverage=trace-pc
LIBS=
DEPS=trace.h
OBJ=test.o trace.o

all: test

trace.o: trace.c
	$(CC) -c -o $@ $<

test.o: test.c
	$(CC) -c -o $@ $< $(CFLAGS) $(LIBS)

test : $(OBJ)
	$(CC) -o $@ $^ $(CFLAGS) $(LIBS)

.PHONY: clean

clean:
	rm -f *.o test

Compile, run, and it works:

[daveti@daveti fuzz]$ ./test
code instrumented...
code instrumented...
code instrumented...
fun2
code instrumented...
fun1: a[10]
code instrumented...

5. Trace PC

The final missing part of our user-space implementation comparing to kcov is the PC tracing functionality. Fortunately, GCC already provides a builtin command to retrieve the return address pushed on the stack. As what linux kernel does, we update trace.c to enable PC tracing:

#include <stdio.h>

#define notrace	__attribute__((no_instrument_function))
#define _RET_IP_	(unsigned long)__builtin_return_address(0)

void notrace __sanitizer_cov_trace_pc(void)
{
	//printf("code instrumented...\n");
	printf("return pc [0x%x]\n", _RET_IP_);
}

Compile, run, and use addr2line to get debugging information from these addresses. Doesn’t it look almost the same as kcov!

[daveti@daveti fuzz2]$ ./test
return pc [0x4005ab]
return pc [0x400581]
return pc [0x400554]
fun2
return pc [0x400568]
fun1: a[10]
return pc [0x4005c6]
[daveti@daveti fuzz2]$ addr2line -e ./test 0x4005ab
/home/daveti/c/fuzz2/test.c:19

6. Notes

One observation we can easily find is the overhead of this instrumentation. Since the instrumentation happens per basic block instead of functions, it is trivial to have bigger number of instrumentations than the number of actual function calls. For example, in the our example, we have 3 functions, and 5 instrumentations. This is also the reason why not every kernel subsystem/component enables kcov (besides, this instrumentation could break kernel booting). Again, this post tries to mimic what kcov does from the user space. For real user space coverage instrumentation, there is gcov[4] already.

[1] https://github.com/google/syzkaller
[2] http://gcc.1065356.n8.nabble.com/Re-Add-fuzzing-coverage-support-td1212322.html
[3] https://github.com/torvalds/linux/commit/5c9a8750a6409c63a0f01d51a9024861022f6593
[4] https://gcc.gnu.org/onlinedocs/gcc/Gcov.html

Posted in OS, Security, Stuff about Compiler | Tagged , , | Leave a comment