Confusion often arises from inconsistencies in coordinate systems and their jargon. Even with plain 2D coordinates, there are (x,y) graphics coordinates measured from the top left, cartesian (x,-y) coordinates, and matrix (r,c) = (y,x) coordinates.
Our grid for this game is going to be 2.5D, in the sense that multiple entities can be at the same 2D grid position, so there is an implicit third dimension. If we think of a grid-based game as being viewed top-down, then we might thing of x as West/East, and y as North/South. However, our game has gravity, albeit not very realistic, acting in the y direction.
So the best choice for the current project seems to be use graphics coordinates, with x meaning Left/Right, y meaning Up/Down and an implicit z meaning Front/Back. We will then need to make sure that all our variable names and comments agree with this convention. Even then, there will be an exception for level files, which are most naturally stored in matrix (y,x) order.
Every time new coordinates are calculated by adding an offset, the resulting coordinates, in principle, have to be checked to see if they are within the grid.
Sentinels can be used to avoid this. In this case, that means using inert wall entities to form a border round the outside of the grid.