What We Did Right
Adopt Agile Development. We use Scrum. We brought in an agile coach.
Unit Tests using phpunit and CruiseControl
Functional Tests using Selenium
We installed writable walls (we never waste much time on finding space to sketch our ideas, design, and discussion)
Put high priority user stories and user stories that requires QA at the top. Everyone swamp on the top user story, get it done before moving on to the next. We get 80% of user stories completely done, rather than all user stories each 80% done (no user story got done completely).
Use wiki to document architectural components and designs.
We hired smart and funny engineers. This makes our scrum, and sprint review meetings fun.
Installed and use Pidgin as our internal Instant Messaging server. We use this to archive the discussion that happened during our release process.
We implemented a change control process. During the development process, if the developer needs to modify configuration file, install new software, etc, the developer would file a change control ticket in the same release. He can then closed his ticket with reference to the change control ticket. When the operation team prepare to do a release, they have a change control meeting during which they review all change control tickets, and make appropriate plan for the release. Change control tickets are closed by operation team after a successful release.
We use YAML as the format for our configuration files.
We monitor all of our background processes. We use nagios to monitor, and graph (cacti) all of our network devices, servers, services (mysqld, apache, background processes).
Use Splunk to aggregate log files.
Each team has a backlog.
Definition of DONE: code complete + documented + unit tested + validated by QA..
We did not have much offices and cubicals. We had open areas where people sit and work together, so furnitures did not get in the way. This makes pair programming easier. Furnitures and cubical walls were easily reconfigurable. This allows us to adjust our working areas as needed. We had a few cubicals for manager, product managers, scrum master. Engineers all sit in open areas.
Occasionally we pull engineers, UI designer, and database engineers into a big conference room when we have large projects. This makes collaboration much easier.
Sprint Retrospective. We have 3 columns (Start, Stop, and Keep). Developers think of things that we should start, stop, or keep doing. We then vote on the each item. We then focus on 2 or 3 items with the most votes during our next sprint.
Scrum and color coded post-it cards. Our scrum boards have 4 columns: To Do, Inprogress, Ready for QA, and Validated. For a user story, we use matching color coded post-it card. The user story is written on large green post-it card. The tasks are written on small green post-it cards. This makes it easy to look across the board, and co-relate tasks to user story. We used small white post-it cards for bugs that get added to the sprint after the sprint is started.
Code freeze 2 days before the sprint ends. Leave 2 days for QA and fixing bugs.
The ScrumMaster bring to meeting the same agenda. For the planning meeting, the scrum master proceed using the same established agenda. This ensure that we don't skip over anything.
What can be improved
Video tape all the meeting and take picture of design sketches. I would like to video tape all of the meetings. We had engineering meeting every Monday, and Wednesday. Monday meeting is for developers, and Wednesday meeting is for Developers, QA, and anyone interested. Monday meeting often involve discussion and presentation of architectural and code design. These are often valuable source for training new hires, and are also source of reference for later (say 3 months from now, I need to work on a piece of code that was presented today, I would be more convenient for me to watch the video again rather than having to talk to the engineer who designed it, who might be busy, forgot how he designed it, or might have left the company).
Agile Development sometime get in our way. Agile Development is very effective most of the time. However, for large project, it often helpful to follow a little bit of each (both Waterfall, and Agile). Waterfall approach emphasis on upfront design and documentation, whereas Agile approach emphasis on getting the product out to the customer sooner and get customer feedback. For large project, we should still follow Agile principles, but we should do a bit of upfront research and design as well. Prior to the team starting to work on this project, UI engineers already researched the user interface, present a mock interface to our internal marketing team, and to some customers, so we were reasonably sure (I thought) what we will be presenting to the customers. Database engineers also did research on the schema for this project. However lead developers were involved in another project, and did not research this project. Lead developers had few discussion with the development manager, but not enough. When the development team started to work on this project, developers did not know how to store the data in the database. There was no discussion between database engineers and developers prior to the start of the project. Lead developers should not be coding. Their job should be to discuss among themselves, analyze and understand projects, come up with good designs, and lead other developers. Development manager who is the most knowledgeable person on the project, attended a recruiting function elsewhere a couple of days, trusted the lead developers (who were not very knowledgeable on the project) to run the project. Development manager, while present in the office, did not get involved with the development process. For this particular project, lead developers were often the bottleneck, and the rest of the team often had unproductive slack time.
We to some degree abused the use of YAML configuration files.
Push / build script should integrate with the OPS repository (pull the YAML from the OPS repository as well as tagging it).
Should use an imaging solution such as SystemImager to ensure that all nodes in a group have exactly the same setup and binaries, as well as the same release of our code.
Should use an online solution for scrum and backlog management. The operation team use scrumy.com. Perhaps, there is agile product that allow team member to submit user stories, and the product owner can reject or accept user stories.
Perhaps, occasionally we should have a code improvement sprint? Perhaps we should have a wiki page that track ideas for improvement, and as part of the sprint planning, look at that page as well.
How do we effectively use the logger? Before we had the logger, our code send email when an error happened. Every morning when I come into the office, I would spend 15 minutes looking through these emails, and as a result, I know if weird conditions happen, and have necessary data to examine. The logger has its benefits. Does not require a human to look through the error emails. It assume that customers would call the support number. But how can we have benefits of both error emails and the logger? Can Splunk help? Can Splunk allow each developers to have his / her own way, whichever work best for each developer (depending on the nature of product that the developer is working on)?
Meetings should be on Friday not Monday. Like most people, for me, the weekends are both time away from work, time spent with kids, and do chores. Both are good, but also bad. On Monday, I often feel sleepy. I want to write code rather than being in meeting. I would prefer to have meeting on Friday where I have more energy and my mind is already active. We often had very slow Monday. We come into the office around 10AM, wait around to 11:30AM for the the Sprint Restrospective / Planning meeting, which ends around 1PM. The task breakdown does happens until 3 or 4PM. I would rather have the meeting and task breakdown happen on Friday, that way, on the weekends, I can think of what I need to do, and come Monday do it rather than wait around. Of course, this is a team decision.
Task breakdown should happens before the Sprint Planning. Development manager, architects, and senior developers should occasionally meet to discuss upcoming user stories / features, and do task breakdown ahead of time, rather than involving the entire development teams. Perhaps, the development manager or team leader should do the task breakdown, pull in additional resources as needed.
Recruiting should not be the primary role of the development manager. If an important recruiting event happens during the development process of an important feature, the development manager should delegate someone else to attend the recruiting event.
Development manager needs to make absolutely sure that the team understand what need to be done before the sprint is started. There is a belief that an authoritative person (manager) should not be part of scrum. Scrum is for team members to help each other. Team members are to talk to the entire team, and not to any individual. An authoritative person can participate in a scrum only as a chicken (only allow to listen, but not talk). If only the development manager knows for sure what he want the team to be doing, and he is not participating in scrum, or actively monitoring the development process, this could lead to a lot of problems, such as developers thought that they develop feature according to development manager, but indeed they have misunderstood the development manager. The development manager need to make absolutely sure that the team understand what need to be done before the sprint is started.
Lead developers and architects should lead other developers. Lead developers should not be coding. Their job should be to discuss among themselves, analyze and understand projects, come up with good designs, lead other developers, and making sure that other developers write application as specified and according to company guidelines. Lead developers can fix production bugs, do some light coding if they have nothing to do. They should not be so busily coding features that they cannot be involved in designing upcoming features. This applies only when we don't have enough lead developer. As junior developers do more coding, gain more exposure to the code base, gain more design skills, become eligible as lead developer, the development manager determine when to rotate, lead developer do coding, giving a more junior lead developer chance to do designing. That way, no one is stuck with just coding.
Rotate lead developers. Assume that we have more than 3 or more developers who are knowledgeable enough about the system to be considered as lead developers, then we should have 1/3 of them writing code, 2/3 of them doing light coding, fixing critical production code, and designing upcoming features. So a lead developer would code for one cycle, do design work and fix critical production bugs for 2 cycles. See above. No one should be stuck with just writing code.
Size the next sprint as part of the current sprint. As the current sprint near its end, the development manager should work with the product owner to prioritize the user stories for the next sprint, and inform the development team. The development team can then size the next sprint, and do task break down. Sprint often end on Friday, so ideally, by Thursday, the development manager and the product owner should have the user stories prioritized. The development team can use the Friday to familiarize themselves with the user stories, do task breakdown and sizing. This scheme of sizing the next sprint during current sprint make sure that we don't front-load too much. This means that there should be one task on the current sprint for sizing the next sprint.