The Leadership Development Paradox: Big Investments, Uncertain Returns
Organizations spend an estimated $60 billion annually on leadership development in the United States alone, according to figures from the Association for Talent Development. Yet research consistently shows that most organizations struggle to demonstrate clear returns on that investment. A landmark study by Beer, Finnström, and Schrader (2016) published in Harvard Business Review found that the vast majority of corporate leadership programs fail to produce lasting behavioral change — not because the content is poor, but because the surrounding systems of measurement and reinforcement are absent.
The paradox is clear: leadership development is universally recognized as critical to organizational success, yet it remains one of the most difficult investments to measure and optimize. The missing ingredient, more often than not, is assessment data. Without rigorous pre-program baselines, mid-program checkpoints, and post-program outcome measures, even the most beautifully designed leadership curricula devolve into expensive events rather than strategic business investments.
The following five warning signs indicate that a leadership development program is operating in the dark — and that integrating psychometric assessment data could transform its effectiveness, credibility, and measurable impact on the business.
Sign #1: There Is No Baseline — So There Is No Way to Measure Growth
Perhaps the most fundamental flaw in leadership development programs is launching without establishing a clear starting point. If an organization cannot quantify where leaders stand on critical competencies before development begins, it becomes mathematically impossible to quantify growth afterward. Without a baseline, every claim about program effectiveness is speculation rather than evidence.
This problem is more pervasive than most L&D professionals realize. Many programs begin with a kickoff event, a keynote speaker, or an experiential exercise — all of which can be engaging and even inspiring — but none of which provide the data needed to track individual or cohort-level development over time. The result is a program that feels good in the moment but leaves stakeholders asking uncomfortable questions when budget season arrives: What did we actually accomplish?
The solution is straightforward: administer validated assessments at the outset of the program to create an objective, quantifiable snapshot of each participant's leadership profile. A multi-rater instrument like the Achieving Leader 360 (AL360) is particularly powerful for this purpose because it captures not just self-perception but also the perspectives of supervisors, peers, and direct reports across six core leadership domains — from Communication & Relations to Adaptive Leadership. When the same instrument is re-administered at the conclusion of the program (or at defined intervals afterward), the organization gains a rigorous pre/post comparison that transforms anecdotal impressions into defensible evidence of growth.
Research on self-other agreement in leadership assessment (Atwater & Yammarino, 1992; Fleenor, Smither, Atwater, Braddy, & Sturm, 2010) demonstrates that gaps between self-ratings and observer ratings are among the most powerful catalysts for behavioral change. Establishing this baseline does not merely serve measurement purposes — it actively accelerates development by giving leaders a concrete, data-rich picture of how their behavior is perceived by those around them.
Sign #2: The Content Is Generic — One Size Fits No One
A second warning sign is a program that delivers identical content to every participant regardless of their individual strengths, developmental needs, or leadership context. While there is value in shared frameworks and common language, a purely generic curriculum wastes time and resources by teaching leaders what they already know while glossing over the specific areas where they most need growth.
Industrial-organizational psychology has long established that development is most effective when it is tailored to the individual (London & Smither, 1995). Adults learn and change behavior most readily when they perceive that the content is directly relevant to their real-world challenges — a principle rooted in Knowles' (1980) theory of andragogy. A leader who excels at Empowerment & Delegation but struggles with Motivation & Development needs a fundamentally different learning path than a peer with the opposite profile.
Assessment data makes individualized development scalable. When each participant enters a program with a detailed personality profile (such as that provided by a Big Five assessment like the ELLSI), a 360-degree leadership competency report, and insight into their values orientation, facilitators and coaches can create targeted development plans rather than generic lesson plans. The AL360, for instance, breaks leadership down into 19 specific factors across its six domains, enabling precise identification of where each leader should focus their developmental energy.
This precision matters at the cohort level as well. When program designers can aggregate baseline data across an entire leadership cohort, they can identify systemic patterns — perhaps the organization's leaders are collectively strong in Leadership Philosophy but weak in Adaptive Leadership — and allocate program time accordingly. This data-informed design ensures that every hour of the program addresses a real, documented need rather than a hypothetical one.
Sign #3: There Is No Follow-Up — Development Stops When the Program Ends
Behavioral change does not happen in a workshop. It happens in the weeks, months, and quarters after a workshop, when leaders attempt to apply new skills in the complex, often resistant environment of their everyday work. Programs that lack a structured follow-up mechanism — coaching sessions, accountability partners, re-assessment milestones — are essentially planting seeds and never watering them.
The research on transfer of training is unequivocal on this point. Baldwin and Ford's (1988) seminal model of training transfer identified three critical categories of factors that determine whether learning translates into sustained behavior change: trainee characteristics, training design, and work environment factors. Follow-up mechanisms address all three by keeping the individual motivated, reinforcing the training content, and signaling organizational support for continued growth.
Assessment data provides the backbone for effective follow-up. When leaders have a baseline AL360 report, they have a concrete set of developmental targets that can be revisited in coaching conversations, discussed in peer learning groups, and formally re-measured at six- or twelve-month intervals. This creates a natural accountability loop: leaders know that their development will be measured again, which research on goal-setting theory (Locke & Latham, 2002) suggests significantly increases effort and persistence.
Furthermore, post-program re-assessment generates powerful narratives of change. When a leader can see that their direct reports' ratings on Employee Involvement have increased by a meaningful margin since the program began, the abstract concept of "development" becomes tangible and motivating. Conversely, when re-assessment reveals areas that have not improved, it provides invaluable information for refining the next phase of development rather than declaring premature success.
Sign #4: ROI Claims Are Entirely Anecdotal
"Participants loved it." "We got great feedback on the facilitator." "Leaders said they felt more confident." These statements are not evidence of program effectiveness. They are measures of participant satisfaction — what Kirkpatrick's (1959) classic evaluation model designates as Level 1 outcomes. While satisfaction is not irrelevant, it is the weakest predictor of actual behavior change, performance improvement, and business results.
Organizations that can only offer anecdotal or reaction-level evidence for their leadership development programs are leaving themselves vulnerable on multiple fronts. Internally, they cannot defend their budgets with the rigor that finance and operations functions routinely apply to their own investments. Externally, they risk perpetuating leadership development's reputation as a "soft" expenditure — a nice-to-have that can be cut when budgets tighten.
Moving from anecdotal ROI to data-driven ROI requires measurement at Kirkpatrick's higher levels: Level 2 (learning), Level 3 (behavior change), and ideally Level 4 (business results). Psychometric assessments contribute directly to Levels 2 and 3. Pre/post knowledge and attitude measures address learning, while pre/post 360-degree assessments directly measure observable behavior change as reported by multiple stakeholders in the leader's environment.
The AL360's structure is particularly well-suited to this purpose. Because it is grounded in established theoretical frameworks — including Self-Determination Theory (Deci & Ryan, 1985), Psychological Safety (Edmondson, 1999), and Adaptive Leadership research (Heifetz, Grashow, & Linsky, 2009) — changes in scores are not merely statistical movements but reflections of theoretically meaningful behavioral shifts. When an organization can report that leaders in a development cohort showed statistically significant improvement in Empowerment & Delegation or Communication & Relations as rated by their teams, the conversation with senior leadership shifts from "trust us, it's working" to "here is the evidence."
For organizations seeking to connect leadership behavior change to Level 4 business outcomes, assessment data provides the critical mediating variable. Research has consistently linked leadership behaviors measured by 360 instruments to team engagement, retention, and performance (Hogan, Curphy, & Hogan, 1994). While establishing direct causal links between a leadership program and bottom-line revenue is always methodologically complex, a chain of evidence — from assessment data showing behavior change, to engagement survey data showing improved team climate, to operational metrics showing performance improvement — is far more compelling than testimonials alone.
Sign #5: Leaders Are Not Developing Self-Awareness — They Are Just Attending Events
The ultimate purpose of leadership development is not to fill seats in a classroom or check a box on a competency model. It is to catalyze genuine self-awareness and sustained behavior change. Yet without assessment data, many programs inadvertently treat development as an event rather than a process — something leaders attend rather than something that fundamentally shifts how they understand themselves and their impact on others.
Self-awareness is widely recognized as the foundation of effective leadership. Research by Eurich (2018) suggests that while 95% of people believe they are self-aware, only about 10-15% actually meet objective criteria for self-awareness. This gap is not merely academic; it has direct consequences for leadership effectiveness, team trust, and organizational performance. Leaders who lack accurate self-perception tend to overestimate their strengths, underestimate their weaknesses, and resist the very feedback that could accelerate their growth.
Multi-rater assessments are among the most powerful tools available for closing the self-awareness gap. When a leader completes a self-assessment and then receives aggregated ratings from supervisors, peers, and direct reports, the resulting comparison often reveals blind spots that no amount of classroom instruction could surface. The AL360 is designed to facilitate exactly this kind of insight, presenting leaders with a detailed map of how their behavior is perceived across 19 factors and highlighting discrepancies between self-perception and observer perception.
Complementary assessments deepen this self-awareness further. A personality assessment grounded in the Big Five model provides insight into stable dispositional tendencies that shape leadership style — such as whether a leader's natural tendencies toward extraversion or conscientiousness are assets or liabilities in their current role. A values assessment can reveal whether a leader's fundamental assumptions about people and motivation align with the organizational culture they are expected to build. When these layers of data are integrated, leaders move beyond surface-level self-knowledge into a rich, nuanced understanding of their leadership identity.
This depth of self-awareness is what separates leaders who merely attend development events from leaders who are genuinely transformed by them. And it is what separates programs that are "nice to have" from programs that are measurably, demonstrably essential to the organization's success.
Turning Warning Signs into a Data-Driven Strategy
Recognizing these five warning signs is the first step. The second step is building assessment data into the architecture of the leadership development program — not as an afterthought or add-on, but as a foundational design element. The most effective programs follow a straightforward cycle:
- Assess: Establish baselines using validated instruments — 360-degree leadership assessments, personality profiles, values inventories, and communication styles — before any development content is delivered.
- Individualize: Use assessment results to tailor development plans, coaching conversations, and program content to the documented needs of each participant and the cohort as a whole.
- Develop: Deliver targeted learning experiences — workshops, coaching, action learning, peer groups — that are directly connected to the competencies and behaviors identified in the assessment data.
- Re-assess: Administer the same instruments at defined intervals to measure change, celebrate growth, identify persistent gaps, and generate evidence for program ROI.
- Refine: Use the re-assessment data to continuously improve the program — adjusting content, reallocating time, and evolving the design based on what the data reveals about what is working and what is not.
This cycle transforms leadership development from a periodic event into a continuous, measurable process of organizational improvement. It gives L&D professionals the evidence they need to advocate for their budgets, gives coaches and facilitators the data they need to maximize their impact, and gives leaders the self-awareness they need to actually change.
The bottom line: Assessment data does not replace great facilitation, powerful content, or transformative coaching. It amplifies all of them — providing the measurement backbone that turns leadership development from an act of faith into a disciplined business investment.
Organizations ready to bring this level of rigor to their leadership development programs can start with the Achieving Leader 360 (AL360), which provides a comprehensive, multi-rater assessment across the six leadership domains most critical to organizational effectiveness. For a consultation on how to integrate assessment data into an existing or planned leadership development initiative, visit the FactorFactory contact page to connect with the team directly.
