Our approach to the O'Reilly Winter Architectural Kata 2025
- Team
- Introduction
- Key Objectives
- Product Implementation Decisions
- Best Practices
- Event Storming
- Requirements
- Architecture Characteristics
- Fitness Functions
- Architecture Style
- Architecture
- Known Limitations
- Usage of GenAI
- Research
- Appendix
- Glossary
- Version History
- Karl Kyck, Linkedin
- Ivan Houston, Linkedin
- Craig McCarter, Linkedin
- Darren Muldoon, Linkedin
- Susan O'Brien, Linkedin
Certifiable, Inc. is an accredited leader in the software architecture certification market, primarily based in the United States. As the owner of a substantial market share, the company’s flagship system, SoftArchCert, provides accredited certification for qualified software architects.
- Established as a trusted authority in software architecture certification.
- Significant market share in the United States.
- Recent global acceptance leading to increased demand from Europe, the U.K., and Asia.
- Anticipated surge in certification requests (5-10 times current volume).
- Concerns over existing manual processes meeting the heightened demand.
- Exploring the integration of Generative AI to enhance systems and streamline operations.
- Goals: Effectively handle the influx of certification applications.
AI Enhanced System - TLDR:
- AI Grading for the Aptitude Test
- AI RAG Ensemble based grading for Architecture Submission
- AI Architecture Question Generation
- AI Generated Feedback
Rejected Functionality
- AI Agent to monitor candidate cheating (Rejected)
- LLM Caching in the system (Rejected)
The following ADRs provide the detail around these decisions and the safeguards and other consideration.
-
Automate Grading and Review Processes: Explore ways in which Generative AI can automate the grading and review of software architecture submissions, reducing the burden on expert architects and accelerating response times for certification.
-
Identify AI Opportunities: Assess the current SoftArchCert system to identify specific areas where Generative AI can be applied to improve efficiency, scalability, and accuracy in handling certification requests.
-
Redesign System Architecture: Develop a comprehensive plan for redesigning the SoftArchCert system architecture to incorporate AI capabilities, ensuring the architecture can support the anticipated growth in certification volume.
-
Enhance User Experience: Identify improvements in user experience for both applicants and certifiers through the integration of AI, ensuring that the certification process remains intuitive and efficient.
-
Ensure Compliance and Quality: Establish mechanisms to maintain the integrity and quality of certification standards, ensuring that the integration of AI does not compromise the rigorous evaluation process that Certifiable, Inc. is known for.
-
Maintain our Reputation: Identify areas of potential concern to our security posture and reputation as the company scales and introduces AI use.
-
Improve Operational Efficiency and Profitability: Scaling in a manner that allows us to remain profitable, efficient and able to meet our SLAs
We will adhere to best practices in AI Architecture Design ensuring we include Guardrails and Evals where appropriate:
- Modular Design:
Structure the system into distinct, independent modules that can be developed, tested, and maintained separately. This will enhance flexibility, allows for easier updates, and improve collaboration. For example, separate modules for data preprocessing, model training, and model inference.
- Scalability:
Design a system that can handle increased loads and expanded datasets without significant rewrites. This will facilitate the growth of the system in response to higher data volumes or more users.
- Data Management:
Implement a robust data pipeline for data ingestion, preprocessing, and storage. Ensure data quality and governance practices are in place. Streamlineing data access and ensuring that high-quality data is used for model training, improving model accuracy.
- Model Lifecycle Management:
Adopting practices for versioning, testing, and deploying AI models (e.g., CI/CD for models). This will ensure reproducibility, facilitating rollback to previous versions, and maintaining performance over time. Tools like MLflow or Kubeflow can be used for model lifecycle management.
- Monitoring and Evaluation:
Setting up monitoring systems to track model performance, data drift, and anomalies after deployment. This will help to maintain the performance of models over time and allows for timely interventions when models begin to degrade.
- Explainability and Transparency:
Designing systems to provide insights into how AI models make decisions (e.g., using LIME, SHAP). This will build trust among stakeholders and aids in compliance with regulations such as GDPR, which requires explainability in AI systems.
- Ethical Considerations:
Incorporating ethical guidelines into the design process to address biases, fairness, and accountability in AI. This will foster responsible AI development and use, reducing the risk of harm to individuals or groups.
- Feedback Loops:
Design systems to incorporate feedback from users or performance metrics to continuously improve models. This will facilitate ongoing refinement of models and alignment with user needs.
We used Event Storming to explore and map out various domains within our business processes. The session focused on capturing and documenting the flow of events, actors, and interactions between systems. This document summarizes our outcomes and serves as a guide to understanding our approach.
-
Integration of Generative AI
- Identify opportunities to incorporate Generative AI into the SoftArchCert system to enhance certification processes.
-
Automated Grading
- Implement AI-assisted grading for short answer questions in the aptitude test and architecture submissions to reduce the manual workload on expert software architects.
-
Certification Process Enhancement
- Automate parts of the certification process to improve efficiency, such as notification systems for candidates based on their test results.
-
User Interface Improvements
- Develop an intuitive user interface for candidates, HR representatives, and expert software architects to interact with the certification system.
-
Scalability
- The system must be designed to handle a significant increase in certification requests (5-10X) due to global expansion.
-
Dynamic Case Study Management
- Implement a mechanism to dynamically manage and update case studies and questions to prevent information leaks and maintain current relevance in the certification process.
-
Data Retrieval
- Enable effective information retrieval processes that provide relevant data and context to the Generative AI model when grading submissions.
-
Feedback Mechanism
- Provide candidates with detailed feedback on their performance, including AI-generated insights alongside expert evaluations.
-
Accuracy and Reliability
- Ensure the accuracy of grading and certification processes, as incorrect assessments can significantly impact candidates' careers.
-
Compliance and Standards
- The system must comply with relevant legal and accreditation standards, including those set by the Software Architecture Licensing Board (SALB).
-
Performance and Efficiency
- Maintain a guaranteed turnaround time for grading (1 week for both tests) while accommodating increased demand without compromising service levels.
-
Cost Management
- Analyze the cost implications of integrating AI into the certification process and ensure that it remains within budgetary constraints.
-
Security and Data Privacy
- Implement data security measures to protect sensitive candidate information and certification data.
-
Monitoring and Evaluation
- Establish metrics and monitoring tools to evaluate the performance and effectiveness of the AI-enhanced grading system.
-
Documentation and Compliance
- Ensure proper documentation of AI implementation processes for compliance and auditing purposes.
-
User Trust and Ethics
- Address potential ethical concerns about AI use in grading, ensuring that the responses generated are appropriate and reliable.
We chose the following as our top 3 architectural characteristics:
- testability
- data integrity
- fault tolerance
Architecture Fitness Functions provide an objective measurement of architectural characteristics.
We have designed two fitness functions which will be used to verify the data integrity of both the Aptitude Test and Architecture Submisission workflows.
According to the TOP 3 driving characteristics:
- testability
- data integrity
- fault tolerance
A service-based architecture was selected to leverage the optimal balance between the driving architecture characteristics: testability, data integrity, and fault tolerance. Although the microservices style was also indicated based on the key driving architectural characteristics the implicit architectural characteristics: feasibility (cost/time), simplicity (indicated by feasibilty), and maintainability, along with the majority of the existing architecture following a service-based architecture style steered us towards a service-based architecture:
By utilizing the C4 approach to visualize software architecture, we were able to depict the dependencies among different components of our application while highlighting their relationships. We will primarily concentrate on the C2 views to present an overarching overview, followed by a deeper exploration of the core components in the C3 view later in the process.
Container diagrams for new architectural components
The current architecture has the following limitations:
- In the future we could implement an AI agent that utilizes machine learning algorithms to monitor candidates during assessments, analyse behavioral patterns, and identify anomalies that may indicate cheating but we decided against this for MVP as outlined in ADR-10 AI Agent to monitor candidates for cheating
This repository contains content that was partially generated using Generative AI technologies. We have utilized these tools to assist in the creation and enhancement of certain aspects of the project. However, please note that all AI-generated content has been carefully reviewed and edited by our team to ensure accuracy, quality, and relevance before being included in the project.
Research was undertaken to understand what mature AI capabilites, architecture patterns, practices, and approaches are currently available in the industry:
- Prompt engineering and RAG retrieval simulation
- Experimental application demonstrating RAG vector database loading, retrieval, LLM context loading, and prompting
- LLM cost analysis for various LLM models and scenarios
- Overview Narrative
- A short narrative describing how the team used AI in the certification system.
- Diagrams
- Comprehensive and targeted views of each use of AI in the system.
- Architectural Decision Records (ADRs)
- Document AI implementation decisions, including trade-off analysis.
- Implementation Details
- Provide pertinent implementation details for the AI components.
- Presentation Video (for semi-final teams)
- A five-minute video describing the team’s approach to integrating AI into the certification process.
- Link to Presentation Video
-
Innovative Use of Generative AI
- Assess the novelty and creativity of AI applications in the solution.
-
Suitability of the Solution
- Ensure that the solution addresses the stated constraints and requirements.
-
Detail and Clarity
- Provide appropriate levels of detail in documentation and diagrams.
-
Use of AI Architecture Patterns
- Adhere to best practices in AI architecture design.
-
Avoidance of Anti-Patterns
- Ensure that the design does not incorporate known anti-patterns.
-
Compatibility with Existing Architecture
- Validate that the architectural characteristics of the AI enhancements align with the current system architecture.
-
Validation and Verification
- Establish processes for validating and verifying AI-generated results to ensure accuracy.