-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing session recordings for user #2927
Comments
Seems like this was an issue with CURL rather than event capture - set up a test suite (code below for reference) and this problem did not occur when the request was made by the browser. Something else is (still) going wrong on that company, not sure why session recording is not working. POC test/// <reference types="cypress" />
describe('Event capture', () => {
given('options', () => ({ capture_pageview: false }))
given('sessionRecording', () => false)
given('supportedCompression', () => ['gzip', 'lz64'])
// :TRICKY: Use a custom start command over beforeEach to deal with given2 not being ready yet.
const start = () => {
cy.route({
method: 'POST',
url: '**/decide/*',
response: {
config: { enable_collect_everything: true },
editorParams: {},
featureFlags: ['session-recording-player'],
isAuthenticated: false,
supportedCompression: ['gzip-js', 'lz64'],
},
}).as('decide')
cy.visit('./playground/cypress')
cy.posthogInit(given.options)
}
it('works', () => {
start()
cy.readFile('../posthog/string.json').then((body) => {
console.log(body)
cy.posthog().then(($posthog) => {
$posthog._handle_queued_event('http://localhost:8000/s', body)
console.log($posthog._handle_queued_event)
})
})
})
}) |
ReproducingManaged to reproduce this locally by:
Similar to above, normal events show up but full_snapshot ones do not in the db. Extra scripts for historic purposesDigging in, I saved the full snapshot events to a file: if len(data["snapshot_data"]) > 1000000:
open('dump.json', 'w').write(json.dumps(data)) And attempted to see run this via python shell: from ee.clickhouse.sql.session_recording_events import INSERT_SESSION_RECORDING_EVENT_SQL
from ee.kafka_client.client import ClickhouseProducer
from ee.kafka_client.topics import KAFKA_SESSION_RECORDING_EVENTS
from datetime import datetime
import json
data = json.loads(open('dump.json').read())
p = ClickhouseProducer()
produce = lambda: p.produce(sql=INSERT_SESSION_RECORDING_EVENT_SQL, topic=KAFKA_SESSION_RECORDING_EVENTS, data=data)
produce() or datetime.now() Nothing interesting shows up after this in clickhouse logs: Local fixI think this narrows things down to either kafka or clickhouse. For this site, the raw json message is 3962402 bytes. Might we be running into kafka message size limitations? As per https://stackoverflow.com/questions/21020347/how-can-i-send-large-messages-with-kafka-over-15mb I did a couple of changes locally: diff --git a/ee/docker-compose.ch.yml b/ee/docker-compose.ch.yml
index e70a5aed..b2dfdf59 100644
--- a/ee/docker-compose.ch.yml
+++ b/ee/docker-compose.ch.yml
@@ -106,3 +106,6 @@ services:
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
+ KAFKA_MESSAGE_MAX_BYTES: 1000048588
+ KAFKA_REPLICA_FETCH_MAX_BYTES: 1000048588
+ KAFKA_FETCH_MESSAGE_MAX_BYTES: 1000048588
diff --git a/ee/kafka_client/client.py b/ee/kafka_client/client.py
index 7c7f9aed..fad6a08e 100644
--- a/ee/kafka_client/client.py
+++ b/ee/kafka_client/client.py
@@ -34,7 +34,7 @@ class _KafkaProducer:
elif KAFKA_BASE64_KEYS:
self.producer = helper.get_kafka_producer(value_serializer=lambda d: d)
else:
- self.producer = KP(bootstrap_servers=KAFKA_HOSTS)
+ self.producer = KP(bootstrap_servers=KAFKA_HOSTS, max_request_size=1000048588)
@staticmethod
def json_serializer(d): After this session recording worked. @fuziontech mind handling bumping these in production kafka cluster and in our producers? |
Tried out sending the raw insert statement to clickhouse as suggested by @fuziontech. This however runs into max query size limits:
We're currently on Heroku with our kafka setup, but we'll move off of it soon. Hopefully we can update the max sizes then. @fuziontech - assigning you this issue until kafka cluster has moved. |
Another user reported this error. |
Temporarily solves #2927, though the error will rear it's head again with plugin-based ingestion. Opened https://github.com/PostHog/posthog/issues/3068 for long-term solution.
* Hotfix for missing session recording events Temporarily solves #2927, though the error will rear it's head again with plugin-based ingestion. Opened https://github.com/PostHog/posthog/issues/3068 for long-term solution. * kwargs everywhere * use settings over kwargs
Every week, 1-2 users mention that they have problems getting all their sessions recorded. I usually just explain them our technical limitations just yet, but is there some other more helpful information we can ask them to improve this? In addition to the above, for reference a user mentioned recently that PostHog captures somewhere between 30-50% of their sessions while FullStory captures 100%. |
Not right now. These are pretty clear product/technical limitations we're dealing with here and both getting visibility and fixing them requires development focus from our side. |
What's the state of this? @rcmarron @alexkim205 |
For customers who have upgraded posthog-js, this seems to be resolved. The percent of "missing recordings" is down from 10-50% (depending on their website) to <1%. At some point, we should look into the remaining 1%, but I think it's safe to close for now. |
Bug description
This came from a client report: https://posthog.slack.com/archives/C01HY5TRZP0/p1610460342002100
For this user, session recording was not working despite being enabled. Digging in a bit more, our db correctly contained a lot of events, but not full payload ones.
Expected behavior
Session recording events are correctly received
How to reproduce
Visit site referenced via slack above. See no recordings show up.
Environment
All
Additional context
This only seems to happen relatively rarely globally, but consistently for this site.
I dug into this technically and what I found was pretty fascinating:
Not a gzipped file
.request.body
locally reveals only bytes up until the NULL bytes are read, even if content-length indicates it should be larger.I asked about this behavior on django IRC + googled, but could not find anything indicating whether NULL bytes in request body are unacceptable or if django is behaving wrong here.
Diagnosis
At least locally, django uses runserver.py behind the hood, which in turn uses a WSGIServer.
In the end, the thing that reads the request body buffer is the following BufferedIO cpython implementation Which assumes NULL bytes it reads are to be ignored.
Not sure if that's correct or not, but that's our world. Not 100% sure that gunicorn behaves in an identical way.
Misc context
Example sentry error: https://sentry.io/organizations/posthog/issues/2079528998/events/d8bf9c6fca384ee988d1288073da5ae4/?project=1899813&statsPeriod=14d (Not a gzipped file (b'\x1f\x00')), though my own is not reported.
Curl request for reproducing NULL byte behavior:
Next steps
To bypass CORS we send the request as content-type text/plain. Using application/x-www-form-urlencoded would bloat the request payload size significantly. One thing worth trying is multipart/form-data.
Extra complexity here might come from needing to make sure newer posthog-js stays compatible with older versions of posthog.
It is also worth updating fflate, which has had some newer developments recently.
Thank you for your bug report – we love squashing them!
The text was updated successfully, but these errors were encountered: