In the Linux kernel, the following vulnerability has been resolved:
net/sched: act_ct: Only release RCU read lock after ct_ft
When looking up a flow table in act_ct in tcf_ct_flow_table_get(), rhashtable_lookup_fast() internally opens and closes an RCU read critical section before returning ct_ft. The tcf_ct_flow_table_cleanup_work() can complete before refcount_inc_not_zero() is invoked on the returned ct_ft resulting in a UAF on the already freed ct_ft object. This vulnerability can lead to privilege escalation.
Analysis from zdi-disclosures@trendmicro.com: When initializing act_ct, tcf_ct_init() is called, which internally triggers tcf_ct_flow_table_get().
static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params)
{ struct zones_ht_key key = { .net = net, .zone = params->zone }; struct tcf_ct_flow_table *ct_ft; int err = -ENOMEM;
mutex_lock(&zones_mutex); ct_ft = rhashtable_lookup_fast(&zones_ht, &key, zones_params); // [1] if (ct_ft && refcount_inc_not_zero(&ct_ft->ref)) // [2] goto out_unlock; ... }
static __always_inline void *rhashtable_lookup_fast( struct rhashtable *ht, const void *key, const struct rhashtable_params params) { void *obj;
rcu_read_lock(); obj = rhashtable_lookup(ht, key, params); rcu_read_unlock();
return obj; }
At [1], rhashtable_lookup_fast() looks up and returns the corresponding ct_ft from zones_ht . The lookup is performed within an RCU read critical section through rcu_read_lock() / rcu_read_unlock(), which prevents the object from being freed. However, at the point of function return, rcu_read_unlock() has already been called, and there is nothing preventing ct_ft from being freed before reaching refcount_inc_not_zero(&ct_ft->ref) at [2]. This interval becomes the race window, during which ct_ft can be freed.
Free Process:
tcf_ct_flow_table_put() is executed through the path tcf_ct_cleanup() call_rcu() tcf_ct_params_free_rcu() tcf_ct_params_free() tcf_ct_flow_table_put().
static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft) { if (refcount_dec_and_test(&ct_ft->ref)) { rhashtable_remove_fast(&zones_ht, &ct_ft->node, zones_params); INIT_RCU_WORK(&ct_ft->rwork, tcf_ct_flow_table_cleanup_work); // [3] queue_rcu_work(act_ct_wq, &ct_ft->rwork); } }
At [3], tcf_ct_flow_table_cleanup_work() is scheduled as RCU work
static void tcf_ct_flow_table_cleanup_work(struct work_struct *work)
{ struct tcf_ct_flow_table *ct_ft; struct flow_block *block;
ct_ft = container_of(to_rcu_work(work), struct tcf_ct_flow_table, rwork); nf_flow_table_free(&ct_ft->nf_ft); block = &ct_ft->nf_ft.flow_block; down_write(&ct_ft->nf_ft.flow_block_lock); WARN_ON(!list_empty(&block->cb_list)); up_write(&ct_ft->nf_ft.flow_block_lock); kfree(ct_ft); // [4]
module_put(THIS_MODULE); }
tcf_ct_flow_table_cleanup_work() frees ct_ft at [4]. When this function executes between [1] and [2], UAF occurs.
This race condition has a very short race window, making it generally difficult to trigger. Therefore, to trigger the vulnerability an msleep(100) was inserted after[1]
|