-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds entry and exit pages to Top Pages module #712
Conversation
This is great @Vigasaurus, thank you! Entry/exit pages is for sure one of those frequently requested features! Perhaps we should rename them in the dashboard to only say "Top", "Entry" and "Exit" as I assume the current presentation would be too long for many mobile screens? Average session duration would be a useful metric for sure for the "Entry Pages" report. Perhaps we name it "Visit Duration" so it follows the same naming principles as we already use "Visit Duration" in the top graph? Makes it easy to identify strong and engaging pages that pull people in. I'm happy to keep the "Bounce Rate" in "Entry Pages" too until we do the #591. "Exit Rate" is definitely something we should try to add to the "Exit Pages" report. Makes it easy to identify pages that perform worse than others. This all will be even more powerful when we do the #458 |
100% agree on renaming the tab names. I do worry that adding visit duration to entry pages will make it overflow a lot on mobile (I intend to tackle this next probably, and the modal UI in general). So maybe we can hold off until I do that to add more columns there. I do honestly kinda think there shouldn't be the duplication between pages and entry pages on this anyways (as long as they do show the same values, which they do as of now), so maybe we add session duration in replacement on entry? Exit rate I agree for sure. Do you have an agreed upon formula in mind? I've seen a couple of varying ones in my research, would you rather just do the way GA does it, i.e. exits/total, or unique exits/unique visitors - or something else? I do definitely think this all be awesome with #458, and maybe I can help out with that after I do some of the other stuff I'm planning; but I'll coordinate with the person who started on it earlier when that time comes. |
Nice! Another note: I think it's best to retitle "Unique Entrants" to "Unique Entrances" to keep everything consistent. I prefer to keep the GA formula (number of exists / number of pageviews) in this case (and in general too unless when we can do something much more accurate/useful). New users frequently compare our numbers to what they see in GA so they get a smoother experience when our definitions/numbers match to what they expect to see. Not sure what you mean by "maybe we add session duration in replacement on entry"? You mean to replace unique or total entrances column with Visit Duration or? Mobile is a bit of an issue in general. On my own mobile, the last column is already cut (I cannot slide over to see the bounce rate) but perhaps it's not the same experience on other screens where rather than cut from display they overflow: |
Can do! Okay cool, will do. I meant to replace one of the two current instances of bounce rate (currently it's both in top pages modal and entry modal) - but maybe any change there would end up just being temporary ahead of #591 anyways. Yeah the modals and a lot of the text overflow is problematic still, it's generally been looking like my fixes to the drop-downs are working alright on mobile; I'll see how I can improve them when I get to them. And no, currently all of them get cut off or compressed, which is of course problematic either way. |
Ahh I understand. Yeah, I like that. We could remove Bounce Rate and include Visit Duration in the Entry Pages report. Then it would have Unique Entrances, Total Entrances and Visit Duration. Visit Duration is definitely more useful than Bounce Rate. |
btw - in my testing the current tab size was adequate on small screens, and making them shorter honestly only made them harder to click, so I didn't change it. If user feedback dictates that they're getting cut off, they can be adjusted I suppose. |
This should be pretty much ready for review. I might go back through and do some refactoring and code cleanup tomorrow, but it should be in its final state as far as my first pass goes. |
Nice, thanks very much @Vigasaurus! Looking forward to checking this out! |
Actually now that I think about it, I think visit duration in entry pages is misleading; I think users would think that is the time on the specific page, not the entire visit. I'm looking into getting time-on-page working (it's a bit complex a query) - which I think would be a better column there if I can get it working. Otherwise, perhaps we should remove the visit duration from entry pages all together. |
"Time on page" is another popular request but it makes most sense to be in the "Top Pages" report itself. It shows average time spent on a specific page. "Visit duration" fits for the "Entry pages" report as it shows the average session/visit duration for those visits/sessions that started with an entrance on a specific page. This is exactly the same way GA displays "Session Duration" in their "Entry Pages" report and something people are used to. |
@Vigasaurus I'm a bit swamped at the moment. Looks great, I'll get to review it later this week :) |
@ukutaht All good, and thanks! Take your time for sure. I'm still whacking about at getting the query for generating time on page given what data is already stored, but its seeming like clickhouse is missing what SQL has to make something like that straightforward. If I get that working any time soon I'll add that to this PR. |
Also I actually just noticed a bug where I compute exit rates for goal-based filters completely wrong (if not crashing all together) - so I'm gonna work on solving that. |
General thoughts Testing it out for correctness and usability first before code review. I ❤️❤️❤️ seeing entry pages for goals. Now we have a much better idea which blog posts and landing pages are driving conversions. Not in scope for this PR but in the future I'd also love to get a conversion rate for each entry page (as well as each source, country, etc). After playing around with it for a while I don't see that much value in surfacing so many different numbers. I am not that deep into analytics but I think optimizing the signal to noise ratio is an important part of the product. Personally I think I would prefer two reports: Top pages: Total pageviews, visitors, exit rate This doesn't really work if we want to add Time on Page and roll bounce rate and visit duration into a single metric. I'm just throwing it out there as a discussion point. How much value is there in seeing total vs uniqe exits, for example? To me it's just noise but maybe there are people who think this is really important. /General thoughts Some specific issues I found
2 and 3 are likely intentional? The For example, what should be shown in 'entry pages' when we have applied a Same for exit pages. When using a |
I personally see no need to do something different with this. All 3 are very common reports (top, entry, exit) and have their use cases. Removing bounce rate from top pages makes that report less useful to understand any sense of quality of a page. And when we get to add time on page and engaged visitors to top pages it will make that one report very crowded (especially without the option to search/filter/sort). Adding entry and exit like it is in the screenshots makes sense to me. Our signal will increase a lot by adding them and I see no extra noise or confusion added to the dashboard. It pretty much follows the design we already do for UTM and devices. Smart use of space and a valuable addition that makes Plausible much stronger in depth without adding any complexity. We should just change "more" to "details" or something like that as even now some people don't understand that there's extra insights under "more" rather than just a longer list of the same. Points 2 and 3 should definitely be part of this as they add a lot of value. Being able to filter the dashboard by a specific entry (or even exit) page I can see being very useful for a lot of different analysis even for us. Last 2 paragraphs make perfect sense too. We should show session based data there. |
Yeah that sounds really cool, I think leaning into goals will definitely be a big thing.
I can agree with that, only issue I could foresee is that without a dedicated Exit Pages report, I dunno how a user could select a
Hmm, I thought I had fixed that, will take a second look.
I think those make sense, should make everything more verbose and specific.
I thought I fixed both of these too 🤔 - maybe I missed an edge case somewhere.
Yeah, I think doing them based on their specific pageview types per session would be great, I considered that in the original but didn't want to add the complexity without discussing it first. |
Beautiful! |
Alright, this should be ready to go with the new changes. I got everything swapped over to work with entry and exit page filters; only things I wasn't able to test fully was Past that, I've now:
I do still need to do a bit more extensive testing, but I think it's mostly ready for review/comments - currently running on my dev box, as always - but I think something like staging may also help with that, since my data set doesn't use any of the UTM stuff and doesn't get much referring traffic so I can't test that as heavily as I would want. |
FYI, this can be tested on https://staging.plausible.io/plausible.io |
@Vigasaurus Not sure which version of Clickhouse you're testing this with. On our staging (which is the same as prod), I get the following error:
|
It seems to me like the entry/exit page queries are not quite right when a page filter is being used. When I click on a row in Top Pages to add a |
Originally, there were only 2 pageviews for `test-site.com`,`/` on `2019-01-01`, but that doesn't make sense when there were 3 sessions that exited on the same site/date.
I'm not exactly sure what the cause of this could be - I have to assume it's the places where I was using
This should now be fixed, I was missing a pretty glaring part of generating sessions that had visited a page (to do this I changed up defp filter_converted_sessions(db_query, site, query) do
goal = query.filters["goal"]
page = query.filters["page"]
if is_binary(goal) || is_binary(page) do
converted_sessions =
from(e in base_query(site, query),
select: %{session_id: fragment("DISTINCT ?", e.session_id)}
)
from(s in db_query,
join: cs in subquery(converted_sessions),
on: s.session_id == cs.session_id
)
else
db_query
end
end but it could potentially be more "elixir-style" doing it this way: defp filter_converted_sessions(db_query, site, %Query{filters: %{"goal" => goal}} = query)
when is_binary(goal) do
converted_sessions =
from(e in base_query(site, query),
select: %{session_id: e.session_id}
)
from(s in db_query,
join: cs in subquery(converted_sessions),
on: s.session_id == cs.session_id
)
end
defp filter_converted_sessions(db_query, site, %Query{filters: %{"page" => page}} = query)
when is_binary(page) do
converted_sessions =
from(e in base_query(site, query),
select: %{session_id: fragment("DISTINCT ?", e.session_id)}
)
from(s in db_query,
join: cs in subquery(converted_sessions),
on: s.session_id == cs.session_id
)
end
defp filter_converted_sessions(db_query, _site, _query), do: db_query lmk. I did also have to change the tests a bit, as there was an instance where there were 3 exits on All the rest of the comments should be resolved/answered - will love to see the new changes up on staging, I think it's in a great spot now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Three things left on my radar at the moment:
- I cannot integrate this work until I've fixed the historical exit page data. Waiting on the Clickhouse repo to get some help with that
- I noticed the loading states are different between the Top Pages and Top Sources reports now. During loading, Top Pages shows the header and tabs but Top Sources doesn't. I don't really have a preference, probably showing what we can is a good thing, but the loading states should be consistent. Up to you which style you go with.
- I think the query for total pageviews for exit rate can be simplified. Not a fan of sending boolean flags to modify the behaviour of a function. I worked out a simpler way, can you confirm this works without regressions? pr/712...pr-712-queries
Found a regression: When adding a filter This is caused by removing this part from the
BTW @Vigasaurus this code is some of the most tangled stuff I've written... I'm basically rewriting the query interface for the API to see if I can find some good abstractions that make reasoning about this easier. If you have ideas let me know. Everything is pretty basic and easy except for making the I think pulling stuff up is the way to go instead of pushing things down. A lot of the subtle filtering and joining happens too deep in the guts of the |
Yeah, absolutely. Surfacing completely wrong data is definitely 😬
Yep makes sense. I think
Yeah I definitely felt a bit gross doing that, will take a look at your way 👍 . I will say, I don't fullyunderstand that format, elixir pattern matching is weird; but I get the gist.
Don't you think we can now properly use the page-based filters for bounce rate? i.e. I'm not necessarily a huge fan of having that filter mean something that it isn't (especially not when that other thing is also an available filter) with bounce rate. This will obviously add some more complexity to that bounce rate calculation, but I think it makes more sense like that. But honestly I can see it both ways in that page for bounce rate may be accurate-ish. Thoughts? Edit: Same would go for visit duration - these 2 are the last two instances of
Yeah I think pulling stuff up would definitely help, but it also kinda feels like pulling stuff up and keeping the bases very bare would end up with a decent chunk of repetition in each upper function - this is exemplified with how many of the current functions are able to use the more complex bases so well, and with redone bases they'd need to do all of that themselves. Honestly though, I think the actual query generation bits are alright and very powerful at the current moment, they just get a bit messy with how much each function does alone - e.g. base_w_sessions_bare - and their names aren't necessarily super conducive to show what they do. So I wonder if just splitting up pieces - e.g. reusing a version of As far as page and goals are concerned, I think the biggest thing for pages is going to be coming up with a consistent definition for how you want pages to function now with exit and entry pages - i.e. should it represent all sessions ever visited the page, and ensuring that every function uses that consistent definition (I think they currently do, but this could get out of hand fast). As for goals, I think the biggest pain point is just that some queries absolutely shouldn't use goals (the To be clear though, I do think the current system is very powerful and honestly pretty straightforward once it becomes clear, but it just sometimes take a while of prodding to get everything clear. Also a sidenote, I'm soon going to be making pageview goals a bit more disgusting (code-wise), but hopefully a lot more powerful otherwise. I've got the whole regex/globbing stuff almost fully set up, just need to go fix an issue where uniques are being duplicated (because a 392% conversion rate doesn't quite work heh). But the actual clickhouse regex matching etc is pretty straightforward, luckily. |
This doesn't quite work the way you'd want, because while removing the page filter works to satisfy that second condition on applying the |
New query looks good! Thanks for making the change. On bounce rate
As a programmer I agree that this feels nice and would make the backend code cleaner. But it could lead to very questionable user experience. Let's say someone clicks on the Same would go for the 'Total pageviews' number, for example. If we accept a common definition that a BTW when building the API I've checked how Google Analytics works to make sure we're not too far off industry standard. Everything we do is pretty consistent with them. When you request
I would love to able to have a consistent definition of what the When querying for bounce_rate and visit_duration then page means entry_page. When querying for entry and exit pages or using a goal filter, then page filter segments sessions that visited that page. When querying for total pageviews, then page filter applies at the event level without sessions taken into account I'm open to brainstorming some ways we could make the page filter consistent without confusing our users. Maybe we could figure out ways of labelling things differently so they are easier to understand |
Yeah that makes sense, I kinda figured it would be the case. The programmatic approach is of course what I'd prefer but that UX would definitely be problematic for anyone not familiar with the actual setup. (And I'm sure a couple months ago, before I looked at any of the backend, I'd have agreed that the page filter alone should indeed mean entry page in those contexts) And considering that overall visit duration and overall bounce rate are now the only two remaining instances where this is a concern, I think kicking the can down the road for better explanations and preferring the UX is 100% the right call. I'll make the relevant change there 👍 |
👏 this should be a good one, think people will like it |
looking forward to playing with this one too! thanks @Vigasaurus! |
Changes
Tests
Changelog
Documentation
Unsure if this needs a docs update - I'd imagine not since it's straightforward but dunno.
Screenshots