By Kent E. Frese, Ph.D. — Industrial-Organizational Psychologist and Founder, FactorFactory
Organizations in the United States spend an estimated $166 billion annually on leadership development (Training Industry, 2023). Yet when pressed to demonstrate what that investment produced, most struggle to point to anything more concrete than participant satisfaction surveys and anecdotal feedback. The problem is rarely the training itself — it is the absence of a measurement framework that connects development activities to observable behavioral change and, ultimately, to business outcomes.
Research consistently shows that leadership development programs grounded in assessment data produce more durable behavior change and higher return on investment than those that rely on generic content delivery alone (Day et al., 2014). Yet many organizations — particularly small and mid-sized businesses where budgets are tight and every dollar needs to work — continue running programs that lack even basic pre- and post-measurement. The result is a cycle of investment without evidence: programs get funded because leadership development "feels important," but they are also the first line item cut when budgets tighten, precisely because no one can demonstrate their value.
The following five warning signs indicate that a leadership development program is operating without the assessment infrastructure it needs. Recognizing these signs is the first step toward transforming leadership development from a discretionary expense into a measurable business investment.
Sign 1: There Is No Baseline — You Cannot Measure What You Did Not Define
The most fundamental warning sign is also the most common: the program launched without establishing a baseline measurement of where participants stood before development began. Without a starting point, there is no way to quantify growth, identify areas that improved, or determine which participants benefited most. It is the equivalent of starting a fitness program without stepping on a scale or recording a single benchmark — three months later, you might feel better, but you have no evidence of progress.
A robust baseline involves more than a self-assessment questionnaire. Multi-rater feedback instruments such as 360-degree assessments provide a far more accurate picture because they capture how leaders are perceived by the people they actually lead. Self-ratings of leadership effectiveness are notoriously inflated; research by Atwater and Yammarino (1992) demonstrated that self-other agreement is a meaningful predictor of leadership effectiveness and that the discrepancy between how leaders see themselves and how others see them is itself diagnostic. A 360-degree assessment administered before development begins establishes not just where a leader is, but where the most significant perception gaps exist — information that should drive the development plan.
Baseline data also protects the organization's investment. When a company commits $50,000 to a leadership development initiative — a realistic figure for a mid-sized business running a cohort through a year-long program — stakeholders deserve evidence that the money produced measurable change. Pre-assessment data makes that evidence possible; without it, the organization is left relying on hope and good intentions.
Sign 2: The Content Is Generic — Everyone Gets the Same Program Regardless of Need
A second warning sign is the one-size-fits-all curriculum. Every participant attends the same workshops, reads the same case studies, and completes the same exercises — regardless of whether their primary development need is delegation, communication, strategic thinking, or managing conflict. This approach assumes all leaders need the same things, which decades of research on individual differences demonstrates is simply not the case (Barrick & Mount, 1991; Judge et al., 2002).
Assessment data enables targeted development. When personality assessments reveal that a leader scores high on conscientiousness but low on openness to experience, the development conversation shifts from generic "become a better leader" to specific "build comfort with ambiguity and experimentation." When 360-degree feedback shows that direct reports rate a leader low on employee involvement but high on task direction, the program can focus precisely on the behaviors that will close that gap. The Achieving Leader 360 (AL360), for example, measures 19 distinct leadership factors across six domains — from Communication and Relations to Empowerment and Delegation to Adaptive Leadership — providing the granularity needed to build individualized development plans rather than generic curricula.
Research by Smither, London, and Reilly (2005) in a meta-analysis of 360-degree feedback studies found that improvement was most likely when feedback was specific, when participants set concrete goals based on that feedback, and when they discussed their results with a coach or supervisor. Generic programs bypass all three of these mechanisms. They deliver broad content, set vague objectives, and rarely connect individual assessment results to the learning agenda.
For small and mid-sized businesses, the cost of a generic approach is particularly steep. These organizations cannot afford to send ten leaders through an identical program when only three of them need the content being delivered. Assessment data ensures that every development dollar targets an actual need.
Sign 3: There Is No Structured Follow-Up — The Program Ends When the Workshop Ends
The third warning sign is the absence of follow-up. The workshops conclude, the binders go on shelves, and no one circles back to measure whether behavior actually changed. Research on the transfer of training — the degree to which learned skills are applied on the job — consistently shows that without reinforcement, follow-up, and accountability, the vast majority of training content fails to transfer (Baldwin & Ford, 1988). Estimates vary, but the commonly cited figure is that only 10-20% of training investments result in sustained behavioral change (Saks & Belcourt, 2006).
Post-assessment data is what closes this loop. Administering the same 360-degree assessment six to twelve months after a development program provides objective evidence of behavioral change — or the lack thereof. It answers questions that satisfaction surveys cannot: Did direct reports notice a difference? Did the leader's self-awareness improve? Did specific targeted behaviors actually shift? This is not about creating a punitive accountability system; it is about creating a learning system. When leaders see their post-assessment data alongside their baseline, the feedback becomes deeply personal and motivating in a way that no workshop evaluation form can replicate.
Follow-up assessment also enables the organization to refine its approach. If a program consistently produces improvement in communication behaviors but not in delegation, that pattern reveals a gap in the curriculum or the coaching support structure. Without post-data, the organization keeps running the same program year after year with no mechanism for improvement.
Sign 4: ROI Is Measured by Anecdote — "People Seemed to Like It"
When asked about the return on a leadership development investment, many organizations default to anecdotal evidence: "The participants said it was valuable." "Our CEO noticed that the leadership team is communicating better." "We haven't had any turnover in that group since the program." While these observations are not meaningless, they do not constitute evidence of program effectiveness, and they are deeply vulnerable to confirmation bias and recency effects.
Kirkpatrick's (1994) four-level evaluation model — Reaction, Learning, Behavior, and Results — remains the standard framework for evaluating training programs. Most leadership development programs measure only Level 1 (Reaction): Did participants enjoy it? Assessment data enables measurement at Level 3 (Behavior): Did participants actually change how they lead? This is the level that matters most for organizational impact, and it is the level that requires measurement instruments beyond post-workshop smile sheets.
Consider the difference between these two statements to a board or ownership group: "Participants rated the program 4.7 out of 5" versus "Across the cohort, 360-degree feedback scores on Empowerment and Delegation improved by 18% from baseline to post-assessment, and direct report ratings of psychological safety increased by 22%." The first statement describes satisfaction. The second demonstrates impact. For organizations investing meaningful resources in leadership development, the distinction is the difference between a cost center and a strategic investment.
The research is clear that multi-source feedback, when embedded in a development system with coaching and follow-up, produces meaningful behavior change (Smither et al., 2005). Assessment data provides the measurement architecture that makes this possible.
Sign 5: Development Decisions Are Based on Assumptions, Not Data
The fifth warning sign is perhaps the most insidious: the organization selects who participates in leadership development and what they work on based on assumptions, politics, or seniority rather than data. The most vocal leader gets the executive coaching engagement. The longest-tenured manager gets sent to the prestigious program. The owner's assessment of who "has potential" drives the entire talent investment strategy — without any objective measurement to validate or challenge those assumptions.
Research on succession planning and leadership talent identification consistently demonstrates that subjective judgments of leadership potential are unreliable and often biased by similarity effects, halo effects, and recency bias (Silzer & Church, 2009). Assessment data provides an objective counterweight — not replacing judgment, but informing it. Personality assessments reveal dispositional strengths and risk factors. Values assessments surface alignment or misalignment with organizational culture. Cognitive ability assessments predict capacity for complex problem-solving. And 360-degree assessments reveal the gap between reputation and self-perception, which is often the most critical data point for development planning.
When organizations combine multiple assessment dimensions — behavioral style, personality, values, cognitive ability, and multi-rater feedback — they build a comprehensive picture of each leader that is far more reliable and actionable than any single data point or subjective impression. FactorFactory's AL360 assessment, grounded in Self-Determination Theory (Deci & Ryan, 2000), psychological safety research (Edmondson, 1999), and adaptive leadership frameworks (Heifetz et al., 2009), provides a structured approach to measuring leadership behavior across the dimensions that research shows matter most for organizational outcomes.
From Practice
A technology services firm with approximately 100 employees had been promoting its best individual contributors into management roles — a common pattern in growing companies where technical expertise is abundant but management infrastructure is thin. The company invested in external training for these new managers: workshops on communication, time management, and team leadership from a well-regarded regional provider. The content was solid. The facilitators were engaging. End-of-session evaluations were consistently positive.
Yet eighteen months into this approach, the company was experiencing higher turnover among front-line employees reporting to these newly promoted managers than it had seen in the previous five years. Exit interviews pointed to a recurring theme: employees did not feel heard, did not feel trusted with meaningful work, and described their managers as technically proficient but interpersonally disconnected. The workshops had not moved the needle because they were treating all new managers identically, without diagnosing what each individual actually needed to develop.
FactorFactory was engaged to establish a baseline using the AL360 with the cohort of eight new managers. The results were illuminating. As a group, these managers scored above average on Leadership Philosophy — they understood intellectually what good leadership looked like. But their direct reports rated them significantly lower on Employee Involvement and Empowerment and Delegation — the behavioral dimensions that employees experience day-to-day. Several managers showed a pattern common among former individual contributors: they understood delegation conceptually but continued to retain control of work because they were faster at it themselves or did not trust the output quality of their teams.
With this data in hand, the development program was restructured. Instead of generic workshops, each manager received an individualized development plan targeting their two lowest-rated AL360 domains. Coaching sessions focused specifically on the behaviors flagged in the 360 feedback. Nine months later, a post-assessment showed meaningful improvement: direct report ratings on Empowerment and Delegation improved by an average of 21% across the cohort, and front-line turnover in those managers' teams dropped to below the company's historical average. The owner, who had been skeptical about the value of "more assessments," acknowledged that for the first time, leadership development felt like something he could see working — because the data showed it was.
Turning Leadership Development into a Measurable Investment
The five warning signs outlined above share a common root cause: the absence of a measurement framework. Without assessment data, leadership development programs cannot establish baselines, personalize content, follow up on behavior change, demonstrate ROI, or make data-informed decisions about talent investment. The result is programs that may feel valuable in the moment but cannot demonstrate lasting impact — which makes them perpetually vulnerable to budget cuts and organizational skepticism.
The solution is not to abandon leadership development or to replace it with assessment alone. The solution is to embed assessment into the development process at every stage: baseline measurement before the program begins, data-driven personalization of content during the program, and post-assessment follow-up to measure and reinforce behavior change. This is the architecture that transforms leadership development from an act of faith into an evidence-based business investment.
For organizations ready to build this measurement infrastructure, the Achieving Leader 360 (AL360) provides a comprehensive, research-grounded starting point — measuring 19 leadership factors across six domains that capture how leaders think, communicate, involve, motivate, delegate, and adapt. Combined with personality, behavioral, and values assessments, it creates the kind of multi-dimensional leadership profile that makes development genuinely targeted and genuinely measurable. To explore how assessment data can strengthen your leadership development strategy, visit FactorFactory's contact page to start a conversation.
