This chapter starts from the observation that metaphoric understandings expressed monomodally through gesture tend to rely on “primary metaphors” (Grady 1997). Asserting that gestures draw on basic, experientially motivated, embodied construal operations, we detail how primary scenes and subscenes (Grady & Johnson 2002), image and force schemas, metonymy, and frames (Fillmore 1982) interact in situated meaning-making. We propose that by shifting the focus from object-oriented schemas, source domains, and mappings to what we call “source actions” and “embodied action frames,” we can account for the pragmatically minded nature and specific mediality of communicative gestural acts integrated in natural multimodal discourse. We argue that coverbal gestures recruit frame structures metonymically, singling out elements of “scenes“ (Fillmore 1977), especially those underpinning correlated metaphoric meanings. We back up our theoretical claims with evidence from neuroscientific studies and outline how a frame-based approach may help trace avenues for further research into embodied cognition and multimodal discourse processes.