Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SURVEY: Who is using Jaeger #207

Open
badiib opened this issue Jun 14, 2017 · 42 comments
Open

SURVEY: Who is using Jaeger #207

badiib opened this issue Jun 14, 2017 · 42 comments

Comments

@badiib
Copy link
Contributor

badiib commented Jun 14, 2017

Hi, you are in a group of individuals who have create or commented on issues in the Jaeger repository and we are doing a simple informal survey about Jaeger usage. If you could answer the following questions, it would be very valuable to gauge interest in the project:

  • If applicable, what company/organization do you represent? How many software engineers?
  • How are you using Jaeger? E.g. full production deployment, considering, experimenting, or "I am not using Jaeger" etc.
    • How long have you been using Jaeger?
    • If you are not using Jaeger but chose another tracing system, what were the reasons?
  • How many services (or microservices) exist in your system layout?
    • How many of them are traced?
  • Can you describe your tracing setup and volumes? I.e. which storage you use, how many traces/spans you store, etc.
  • What types of problems are you solving with tracing?

Also consider adding your organization to ADOPTERS.md.

@jkandasa @Sunfaces @jbdalido @princeop @pavolloffay @mabn @jpkrohling @nlamirault @JodeZer @prestonprice57 @jrbury @objectiser @sloev @hwinkel @Madhu1512 @yuekui2 @valichek @dianvaltodorov @ZhouZiHe @LoungeFlyZ @jeluard @diegofernandes @d-ulyanov @jyothepro @yqf3139 @tomersimis @ruinanchen @szdavid92 @anuptalwalkar @hekike @sul4bh @Strandedpirate @julianste @awhiteside @nklmish @sweatybridge @kevinearls @felixbarny @hzariv @nlamirault @longXboy @drzero42 @xdralex @philipgian @bharat-p

@JodeZer
Copy link

JodeZer commented Jun 15, 2017

  • A financial system service providers company From Shanghai
  • experimenting
  • To be honest, we are considering zipkin more.My team members are more familiar with ES and MySql than Cassandra, and our java coders like zipkin. Though l as a pure gocoder perfer jaeger.
  • We are during a microservice transformation.And our system is half java and half golang.
  • BTW, we select a kafka x ES solution with zipkin which jaeger does not provide.

@yqf3139
Copy link

yqf3139 commented Jun 15, 2017

If applicable, what company/organization do you represent?

I am a contributor to fission, which is a FaaS solution on top of Kubernetes.

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

We need to integrate a distributed tracer for two usage:

  • Help to troubleshoot performance problems with fission itself.
  • Provide tracer handler to user so that they can instrument their code easily and trace the functions as part of a bigger solution.

Currently I am doing some experiments on the integration.

If you are not using Jaeger, why not?

Will find myself some time to try Jaeger. It seems Jaeger has better client library support.

How many services (or microservices) exist in your system layout?

Around 10 microservices. Excluding user functions, which are also services evolving over time.

@nlamirault
Copy link

nlamirault commented Jun 15, 2017

  • If applicable, what company/organization do you represent?

I work for a subsidiary company of Orange.

  • How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

We are experimenting OpenTracing in a futur API Gateway services.

  • If you are not using Jaeger, why not?

We use Jaeger using Kubernetes deployment.

  • How many services (or microservices) exist in your system layout?

Around 10 services.

  • which storage are you using ?

Cassandra.

@codefromthecrypt
Copy link

codefromthecrypt commented Jun 15, 2017 via email

@jbdalido
Copy link

jbdalido commented Jun 15, 2017

  • Zenly, a live location sharing social network (https://github.com/znly)
  • Full production deployment on top of scylladb
  • Around 10 services using jaeger in production, more coming

@JodeZer
Copy link

JodeZer commented Jun 15, 2017

@jbdalido glad to see scylladb !

@bharat-p
Copy link
Member

bharat-p commented Jun 15, 2017

  • Elastica (part of Symantec)
  • Experimenting in QA/Dev systems
  • Around 10-15 services/microservices

@hzariv
Copy link

hzariv commented Jun 15, 2017

  • If applicable, what company/organization do you represent?
    --- eBay, Inc

  • How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.
    --- We are currently evaluating Jaeger and Open Zipkin for OpenTracing

  • If you are not using Jaeger, why not?
    --- We have not ruled out Jaeger yet. We used Jaeger Java client first and now evaluating the backend that was recently open sourced. Lack of streaming support for the collector is an issue for Jaeger.

  • How many services (or microservices) exist in your system layout?
    --- 500+

Also integration with mesh service proxy such as Envoy or Linkerd is important to us.

@xdralex
Copy link

xdralex commented Jun 15, 2017

If applicable, what company/organization do you represent?
Stitch Fix

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.
Considering/experimenting

If you are not using Jaeger, why not?
The environment for which we are considering Jaeger is mostly Python 3, so waiting either for this pull request to be merged or an alternative implementation :)

@mabn
Copy link

mabn commented Jun 16, 2017

If applicable, what company/organization do you represent?
Base CRM

How are you using Jaeger?
Experimenting in production - there's a process which listens on kafka to our custom traces, converts them and publishes to jaeger.

If you are not using Jaeger, why not?
Traces with ~1M spans make jaeger hard to use, have to deal with it first.
As for instrumenting services with opentracing - this will take time, only 1 service has it so far.

How many services (or microservices) exist in your system layout?
100+

Storage
We're using AWS managed Elasticsearch - mainly because it's managed, but also because we have experience with ES and not with Cassandra. I'm still trying to make it work properly though - right now (2017-09-22) it performs poorly and drops a lot of spans because indexing does not use bulk API, indices are created without index.translog.durability=async and AWS ES requires signing of each index so there's additional proxy to go through.

@hekike
Copy link

hekike commented Jun 21, 2017

If applicable, what company/organization do you represent?
RisingStack

How are you using Jaeger?
Experimenting with automatic instrumentation for Node.js: https://github.com/RisingStack/jaeger-node

If you are not using Jaeger, why not?
Node.js async_hooks is still in experimental phase.
Currently, our own tracing is more feature complete: http://trace.risingstack.com

How many services (or microservices) exist in your system layout?
50+ (our product's backend)

@pvlugter
Copy link

pvlugter commented Aug 2, 2017

Lightbend has OpenTracing integration for Akka (and this is being extended to more Lightbend technologies, such as Akka HTTP, Play, and Lagom). Many of our customers are interested in tracing for distributed systems or microservices. The Jaeger client is used as the default OpenTracing client to report to Jaeger or Zipkin, giving our customers the option of using Jaeger.

@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 9, 2017

If applicable, what company/organization do you represent?

GrafanaLabs

How are you using Jaeger?

currently prototyping an implementation for our tsdb with the goal of validating performance and suitability and then taking to production.
potentially we may add opentracing to our other software (like Grafana) as well.
our most urgent need was just getting rich, context-specific distributed logging in place so we can diagnose performance trouble and jaeger looks like a good fit. In particular compared to "just distributed logging" systems like ELK/crate or oklog, we realized we want tracing not just logging.

How many services (or microservices) exist in your system layout?

We have about 20 different projects that we run, but many of them run them multiple times (many of our customers have a dedicated single-tenant deployments in kubernetes)

UPDATE sept 22
we're now using jaeger in prod for 2 different projects (each running hundreds times due to multi-tenancy) and we're also working on adding jaeger support into grafana itself.

backend: cassandra

@frankgreco
Copy link

frankgreco commented Aug 15, 2017

If applicable, what company/organization do you represent?

Northwestern Mutual

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

I developed Kanali which we use to proxy all production traffic in our Kubernetes clusters. Kanali integrates with Opentracing to provide end to end distributed tracing. I love the Jaeger project as it is the most robust and clean UI for Opentracing IMHO

How many services (or microservices) exist in your system layout?

We currently use Jaeger to visualize tracing for 100s of microservices. These traces are used by 1000s of developers every day.

@otisg
Copy link

otisg commented Aug 28, 2017

Interesting to hear folks say jaeger has a better client library,
especially as Jaeger is OpenTracing which is supposed to make that point
moot between systems.

@adriancole I think people say this because OpenZipkin doesn't seem to have OpenTracing compatible Python or Node tracer, only Java and Go or, if it has, it's not immediately obvious.

@ejwood79
Copy link
Contributor

ejwood79 commented Sep 9, 2017

If applicable, what company/organization do you represent?

Under Armor

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

Limited production deployment, expanding.

How many services (or microservices) exist in your system layout?

100s.

@jnewmano
Copy link
Contributor

jnewmano commented Sep 15, 2017

#396 @black-adder

  1. If applicable, what company/organization do you represent?

Weave

  1. How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

Full production deployment across both Kubernetes and virtual machines. Using OpenTracing+Jaeger with Cassandra for storage.

  1. How many services (or microservices) exist in your system layout?

100s of microservices

@Dieterbe
Copy link
Contributor

Am I the only one who finds "How many services (or microservices) exist in your system layout?" an ambiguous question? I don't understand if this means the amount of unique software projects, or the amount of daemons running (where you count all copies of the same service running)

@frankgreco
Copy link

@Dieterbe I take service to be a unique microservice. A good analogy would be a Kubernetes service.

@pavolloffay
Copy link
Member

Hi all, @jnewmano @ejwood79 @otisg @frankgreco @Dieterbe @pvlugter @hekike @mabn @xdralex @hzariv @bharat-p @jbdalido @nlamirault @yqf3139 @JodeZer

could you also please mention which storage are you using? Whether Cassandra or Elasticsearch. Edit your comment or just comment below.

Thanks

@ejwood79
Copy link
Contributor

ejwood79 commented Sep 24, 2017 via email

@otisg
Copy link

otisg commented Sep 25, 2017

Elasticsearch here at Sematext

@bigkraig
Copy link

We're in experimentation phase at Ticketmaster. Hundreds of microservics that will need to be instrumented but after a few teams have started tracing interest is gaining.

@B0go
Copy link
Member

B0go commented May 4, 2018

If applicable, what company/organization do you represent?

https://github.com/ContaAzul | http://contaazul.com

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

We just deployed it to production in our Kubernetes cluster saving data to ElasticSearch on AWS (AWS Elastic Search Service)

How many services (or microservices) exist in your system layout?

~100 instances of ~ 50 services

yurishkuro pushed a commit that referenced this issue May 5, 2018
refs  #207 (comment)

Signed-off-by: Victor Bogo <b0go@users.noreply.github.com>
caniszczyk added a commit that referenced this issue Oct 16, 2018
#207 (comment)

Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
@caniszczyk
Copy link
Contributor

@trondhindenes thanks, added you here: #1121

yurishkuro pushed a commit that referenced this issue Oct 16, 2018
#207 (comment)

Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
@zdicesare
Copy link
Contributor

zdicesare commented Oct 17, 2018

If applicable, what company/organization do you represent?
Vistar Media

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.
We are using Jaeger in an AWS-based stack for performance analysis and debugging in all envs. We annotate traces with business logic metadata as well.

We have the Jaeger infrastructure running in ECS and deployed via CloudFormation, the agents are deployed both in ECS and paired with ElasticBeanstalk applications.

How many services (or microservices) exist in your system layout?
Less than 10, but this is increasing. We trace some services that are isolates and also are experimenting with tracing our builds (we use Bazel)

Storage
AWS hosted ElasticSearch

@Puneeth-n
Copy link

Puneeth-n commented Oct 30, 2018

If applicable, what company/organization do you represent?
@comtravo

How are you using Jaeger?
Production on a subset of microservices.

If you are not using Jaeger, why not?
Currently we are using Jaeger but considering Opencensus as it matures because what we really miss is good auto-instrumentation support for Node.js. We forked the auto instrumentation from RisingStack and fixed some small issues.

DataDog ships their own opentracing-api compatible tracer along with auto instrumentation which is cool.

How many services (or microservices) exist in your system layout?
26

Storage
AWS ES

@ThomWright
Copy link

If applicable, what company/organization do you represent?

Candide
@candide-eu

How are you using Jaeger?

Full production.

Running on GKE with an Elasticsearch backend hosted on Elastic Cloud.

We have the client library integrated into our NodeJS service shell library to automatically trace inter-service requests.

How many services (or microservices) exist in your system layout?

>30 k8s services in our prod environment. Most of them Jaeger-enabled.

@clyang82
Copy link
Contributor

clyang82 commented Nov 14, 2018

elasticsearch in IBM Cloud Private with tls enabled

@EaconTang
Copy link
Contributor

EaconTang commented Aug 26, 2019

Q: If applicable, what company/organization do you represent? How many software engineers?
A: Tencent TEG Infosec Department, about 300+ engineers.

Q: How are you using Jaeger?
A: Full production deployment.

Q: How long have you been using Jaeger?
A: Since May in 2019, has been about 4 months.

Q: If you are not using Jaeger but chose another tracing system, what were the reasons?
A: We are using Jaeger.

Q: How many services (or microservices) exist in your system layout?
A: At least 100 services.

Q: How many of them are traced?
A: At least 10 services are traced, and this number would be about 80+ at the end of this year.

Q: Can you describe your tracing setup and volumes? I.e. which storage you use, how many traces/spans you store, etc.
A: kafka+es, currently about 600 millions spans each day.

Q: What types of problems are you solving with tracing?
A: We use Jaeger for monitoring health of rpc servers, analyzing root cause and drawing service topology.

@d-ulyanov
Copy link

Q: If applicable, what company/organization do you represent? How many software engineers?
A: Ozon (e-commerce, marketplace), about 500 engineers.

Q: How are you using Jaeger?
A: Full production deployment (either for Kubernetes + legacy non-Kubernetes services).
Our setup of Jaeger is strongly modified and most of the components have been rewritten (except for UI, see details below)

Q: How long have you been using Jaeger?
A: ~1 year

Q: If you are not using Jaeger but chose another tracing system, what were the reasons?
A: After several months of using Jaeger our developers asked us to add more advanced sampling policies to get more insights: priority sampling for traces with errors, long traces, etc. Probabilistic sampling was cool at the start but it provides too small possibilities when you're troubleshooting on production. Also, there was a question with logs - how to use span logs but avoid writing logs to 2 places.
Finally, we've replaced Jaeger agent and collector by our implementation.
Main features: tail-based sampling (traces with errors, traces with anomaly high time, etc.), keeping ALL traces in memory for 30m (searchable from Jaeger UI), Jaeger UI backend integrated with our logging system (it attaches logs to spans on-the-fly, so we're not writing span logs to Jaeger's ElasticSearch), building near-realtime dependency graph with RPS/RT for each edge.

Q: How many services (or microservices) exist in your system layout?
A: >500 services.

Q: How many of them are traced?
A: We've built "scratch" framework as the basement of any microservice that instrumented with metrics and tracing out of the box, so most of the services are well-instrumented (~95% of services are covered).

Q: Can you describe your tracing setup and volumes? I.e. which storage you use, how many traces/spans you store, etc.
A:
Setup:

  • Our own implementation of Jaeger agents
  • Our own implementation of collectors (3 instances with 12 CPU + 90GB)
  • ElasticSearch (6 instances with 8CPU + 64GB + 3 master nodes with 4CPU + 8GB).

Stats:

  • On collectors: 600k spans / s (~20k traces / s)
  • Collectors keeps ALL traces for 30m in memory (thats why such memory). Collectors provide search by traceid and by service + tags, so it's fully integrated with Jaeger UI.
  • Sampled to storage: 20k spans / s (~1k traces/s)

Q: What types of problems are you solving with tracing?
A:
We're using tracing for 2 main directions:

  • Fast troubleshooting on production and analyzing root cause
  • Building and analyzing service topology (our custom near real-time implementation). Here we also have several directions: a) just to understand service dependencies. b) analyze services graph of particular web-page (we usually don't use full topology because of tonns of services). c) finding out bad-design practices, like services recursive calls.
  • We're also planning to use dynamic service topology for smart alerting (print root cause right in the alert, smart alerts inhibition, etc..)

Thanks for Jaeger!
And ask me if you're interested in any details :)

@linjmeyer
Copy link

If applicable, what company/organization do you represent? How many software engineers?

Redbox; ~50 Software Engineers, ~5 DevOps/Delivery Engineers

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or "I am not using Jaeger" etc.

We are using Jaeger in production for all of our applications on Kubernetes, as well as a select set of non-Kubernetes cloud applications. All services are ASP.NET Core (C#). We use a managed ElasticSearch cluster with collectors across our cloud infrastructure to ensure we can perform end to end spans across multiple regions/cloud providers. For Kubernetes we are using the Jaeger Operator and Istio as a service mesh. All services being traced are using the Jaeger C# Client with our own wrapper library to add some additional features like logging the JaegerSpanId and adding Prometheus metrics for the internal Jaeger metrics. Most services are using the remote sampling configuration from the collector.

How long have you been using Jaeger?

Around 6 months, 3 months in production.

How many services (or microservices) exist in your system layout?

70+ Services/Microservices using various cloud providers and k8s.

How many of them are traced?

Around 30 services in both Kubernetes and non-Kubernetes cloud environments.

Can you describe your tracing setup and volumes? I.e. which storage you use, how many traces/spans you store, etc.

  • ElasticSearch
  • One Managed Elasticsearch cluster per environment (Production and Staging)
  • Production environment handles ~10-15 million spans every 3 days (we keep 3 days of history)
  • Remote Sampling, probalistic, 0.3 is our default with some services occasionally at 100% sampling to debug specific issues

What types of problems are you solving with tracing?

We use Jaeger to observe and troubleshoot performance issues and to understand what service-to-service dependencies we have.

liontwinkle added a commit to liontwinkle/go-jeager that referenced this issue Aug 1, 2020
refs  jaegertracing/jaeger#207 (comment)

Signed-off-by: Victor Bogo <b0go@users.noreply.github.com>
liontwinkle added a commit to liontwinkle/go-jeager that referenced this issue Aug 1, 2020
jaegertracing/jaeger#207 (comment)

Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
@Betula-L
Copy link
Contributor

Betula-L commented Jan 13, 2021

If applicable, what company/organization do you represent? How many software engineers?

bilibili;

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or "I am not using Jaeger" etc.

We are using Jaeger in production for most of our applications on Kubernetes, as well as few of applications deployed on machine.
We use Jaeger Agent and Jaeger Collector with little revise. Those two provide enough features in production.

However, we rewrite Jaeger SDK and Jaeger Job totally. In our experience, almost all of golang applications can use Jaeger for tracing easily for us, but others do not, i.e. Java, Python. Skywalking agent may be a better choice for trace collection, because applications can import jar more easily than a SDK.Maintaining tracing SDK for thousands of different language applications is a really painful job, especially for python. We hope find a painless way to manage that in the future.

How long have you been using Jaeger?

Around 1 years in production.

How many of them are traced?

1000+ Services/Microservices using various cloud providers and k8s.

Can you describe your tracing setup and volumes? I.e. which storage you use, how many traces/spans you store, etc.

We apply Clickhouse now, but used ScyllaDB before, where Elasticsearch performs bad in scalability and Cassandra/ScyllaDB is hard to do complex query for lots of situation.

We have 1million/s spans and save them 7 days, for troubleshooting performance issues and maintaining dynamic service-to-service dependencies.

@pavolloffay pavolloffay unpinned this issue Jan 28, 2022
@zdyj3170101136
Copy link

zdyj3170101136 commented Jul 22, 2022

如果适用,您代表什么公司/组织?有多少软件工程师?
mihoyo;
你是如何使用 Jaeger 的?例如完整的生产部署、考虑、试验或“我没有使用 Jaeger”等。
we use agent->collector->kafka-> flink and ingester -> clickhouse.

we reimplement jaeger-agent:
1, use websocket to redirect []byte directly from client to collector.
2, use unix domain socket to replace udp.

您使用 Jaeger 多久了?
I in charge of it for half of year.
We had used at least 3 year.
如果您没有使用 Jaeger 而是选择了其他跟踪系统,原因是什么?
您的系统布局中存在多少服务(或微服务)?
thousands.
其中有多少被追踪?
ALL.
您能描述一下您的跟踪设置和数量吗?即您使用哪个存储,您存储了多少跟踪/跨度等。
use clickhouse to store at least millions of spans per second for 30 days.
您通过跟踪解决了哪些类型的问题?
1, service dependency graph.
we use google's pprof to make display thousand's of service relation is possible and loop very good.

search with service, only show the service and it's up and down stream service.
search with group, only show the group's service.
connect service dependency graph with metric, a service node in graph do not only have it's name but also have the average latency, span count, error percent in time range.

and just like the google's pprof, our ui also have:

  • a service with hight error percent, it's node is more red
  • a service with high accumulates relations it's node is more bigger
  • the line between service and service is more vertical if the relation is more bigger

2, full sampling.
After reimplement jaeger agent and replace agent thrift marshal/unmarshal protocol by more efficient protocol.
We can sampling all trace.

3, high accuracy histogram.
We use clickhouse as metric store, which make store histogram each service/operation with hundred time bucket possible,
which would cost hundred of GB memory if using prometheus.

4, Critical path.
show each span's truly execute time.
we have different two ui to display:

  • pprof, group by operation. (but have some problem, case one operation express multi span, it make harder for user to understand it)
  • jaeger ui(we change the jaeger ui to make the execute time as black duration bar)

5, Connect trace with runtime/pprof.
We can connect a trace with runtime pprof, show a request's flamegraph.
which func the request costed cpu.

6, Tail-based sampling.
sampling span with p99 latency, error tag.

7, package instrumentation.
elastic search, kafka, net/httptrace, mongodb, redis, grpc, sql.

8, explore.
an ui which make we can:

  • search with multi service/operation
  • group by tag
  • give recommend tagKey/service/operation(order by span count) in search.

outdoorSpirit pushed a commit to outdoorSpirit/Go-Jag that referenced this issue May 3, 2024
refs  jaegertracing/jaeger#207 (comment)

Signed-off-by: Victor Bogo <b0go@users.noreply.github.com>
outdoorSpirit pushed a commit to outdoorSpirit/Go-Jag that referenced this issue May 3, 2024
jaegertracing/jaeger#207 (comment)

Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
@lunkan93
Copy link

If applicable, what company/organization do you represent?
Elastisys - Creators and maintainers of Compliant Kubernetes

How are you using Jaeger?
Since early 2023, we have been offering Jaeger as a managed service to our customers for their distributed tracing needs, operated and maintained by us.

Which storage do you use?
We deploy Jaeger with a dedicated OpenSearch cluster. This choice was based on a couple of reasons:

  • Based on your recommendation
  • We already run OpenSearch as a core component of our Kubernetes platform to store all container logs
  • Dedicated cluster because we wanted to avoid creating a dependency between our core OpenSearch and Jaeger:
    • To avoid potential compatiblity issues blocking us from upgrading our core OpenSearch
    • To avoid putting further stress on our core OpenSearch and have the ability to scale the dedicated cluster individually

What types of problems are you solving with tracing?
We saw an increased interest in a managed distributed tracing solution from our customers, as they wanted to gain deeper insight into their applications, or just a more complete observability stack in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests