Show HN: SigNoz – open-source alternative to DataDog, NewRelic

510 points by pranay01 2 years ago

We just released a major upgrade to SigNoz with support for Logs management based on ClickHouse. Would love to get any feedback from the community here on what you think and any questions you may have for us

Many big companies like Uber and Cloudflare have been shifting to ClickHouse as their main workhorse for Logs management seeing much better performance. for e.g Cloudflare recently shifted from Elastic to ClickHouse and are seeing 8x improvement in memory/cpu resource requirement in ingestion.

This is our first release with Logs support and we have added support for:

- Filtering logs based on fields

- Full text search in logs

- Live mode to see logs coming in realtime

- Detailed view of logs in table and json format with ability to add filters quickly

- Ability to specify interesting fields which will be indexed by default

pranay01 2 years ago

This is our first release with Logs support and we have added support for:

- Filtering logs based on fields

- Full text search in logs

- Live mode to see logs coming in realtime

- Detailed view of logs in table and json format with ability to add filters quickly

- Ability to specify interesting fields which will be indexed by default

cebert 2 years ago

I like DataDog but it is prohibitively expensive for monitoring serverless applications, even with negotiated rates. I don’t think their sales team has identified a reasonable way to bill applications that entirety use serverless AWS services. We’re looking for alternatives. Paying for DataDog is many orders of magnitude higher than our AWS bill.

pranay01 2 years ago
```
   Paying for DataDog is many orders of magnitude higher than our AWS bill.
```
Wow! This is blowing my mind.
Do you this this case for most companies monitoring serverless applications with DataDog, or there is something specific about your infra which cause this
- cebert 2 years ago
  
  The application I am currently working on is all lambdas ~ 45, DynamoDB, S3, CloudFront, Cognito, and SQS and SNS. My employer has several serverless applications with relatively moderate use, and I work on one of them. Our total cloud costs for the product I work on for our DEV/STAGE/PROD/SANDBOX environments is currently less than $1,000/mo. Our estimated cost of DataDog monitoring for the next year just on my application is at least 23k/yr using negotiated rates. We don’t have crazy traffic, but do have global users invoking all of our lambdas at least once an hour. DataDog charges a fixed monthly cost for each lambda invoked at least once an hour on average. Then, you also need to pay for ingestion, storage, and custom metrics. Just on my product alone with multiple isolated environments, this gets expensive.
  Many other product teams at my work have lightly used serverless apps. The DataDog costs simply aren’t feasible for serverless apps. We’re actively looking into alternatives such as just using CloudWatch, Elastic, etc as it’s a huge cost for us.
  
  VectorLock 2 years ago
  
  Wonder how much it would be if you just used all the native AWS tools for things you get from Datadog.
  
  pranay01 2 years ago
  
  As far as I understand, AWS native tools like Cloudwatch are not as good
  
  pranay01 2 years ago
  
  thanks for the detailed note.
  I was just checking Datadog pricing for serverless, there is says - $7.20 per active function per month. If you are using 45 lambdas, is the number of functions much higher? I am guessing ~200 or so?
  Though I can see, how charging based on functions can quickly shoot up the bill
  
  cebert 2 years ago
  
  The problem is we have 4 isolated environments so it’s 4x number of lambdas. Plus, since you only pay for lambdas when they’re running we also deploy developers PRs in AWS so that we can test their API changes with integration tests before merging those changes in. The fixed cost is a killer. We have developers on our team in India, Ukraine, and the US so even our dev environment is used 24x7 essentially.
  
  alb4 2 years ago
  
  @cebert have you checked out Scanner.dev? It uses sparse skip-list indexing and serverless components to let you query a terabyte of logs in seconds and pricing is around the same as Cloudwatch
  
  xiwenc 2 years ago
  
  What are the 4 isolate environments in your case? Dev, test, accp, and prod? If so, you almost slash the cost by 50% by only monitoring accp and prod. Can be accomplished by introducing a toggle in your lambda’s.
  
  mustyoshi 2 years ago
  
  Are you sure you need to monitor the PR lambdas? If they're going to make it to one of the 3 environments before prod.
  
  smetj 2 years ago
  
  Time to look into what output DD delivers which drives service/product decisions with financial impact and look for an alternative to re-implement. Stop being hypnotized by the fancy blinkenlights.
  
  berkay 2 years ago
  
  @cebert Have you looked at us serverless specific solutions like Thundra (my company), Serverless.com, etc.? I think the cost for use case may be order of magnitude lower since the pricing is only based on number of invocations.
  
  masterofmisc 2 years ago
  
  Can you tell me what the difference is between your DEV and SANDBOX environemnts is? Curious to know.
  
  cebert 2 years ago
  
  We typically use Sandbox for significant deployment changes such as upgrading to Node 16, updating security policies, etc. in an isolated environment. If things break, it doesn’t impact DEV/QA.
  
  masterofmisc 2 years ago
  
  Ahh gotcha.. Thanks for the info. Thats useful to know.
- vasco 2 years ago
  
  It also blows my mind, we are also heavy Datadog users and our Datadog bill is roughly 1/10 of the AWS one. Our architecture isn't fully based on serverless because we like to get work done, but I wonder if that's the only cause or if they are using custom metrics wrong or something along those lines.
  If you're paying less than 10% of infra costs for monitoring, you probably don't have good enough monitoring. But if you're paying more than 25% of your infra costs for monitoring, someone is not doing their job.
  
  salil999 2 years ago
  
  > Our architecture isn't fully based on serverless because we like to get work done
  What do you mean by this?
  
  tepitoperrito 2 years ago
  
  I think he's being funny. I thought it was hilarious. There are huge boons to productivity if you know your stack. I don't know serverless so if I built anything around it it would probably just be shiny object syndrome.
  
  cebert 2 years ago
  
  I think it depends on your use case and organization. My employer has traditionally built on-prem software customers run in their data centers. Everyone wants to move to the cloud now. However, we admittedly don’t have a lot of cloud experience yet. Serverless works well for us as we have a smaller but lucrative customer base (not Netflix scale). Amazon does a lot of heavy lifting for you such as 3 AZs by default, easy scaling, etc. W e provide value to our customers by understanding their domain and business logic challenges. Using serverless helps us focus on that and allows us to grow our cloud expertise without needing to manage k8 clusters or having large teams related to ops.
  We have a lot of request/reply CRUD type requests that are heavier on reads than writes. We use API Gateway to manage websocket connections for us. This type of usage pattern and size of our customer base fits well with serverless.
- josephcooney 2 years ago
  
  A tangentially related anecdote - I heard from a guy from MS that if you turn on Azure's AKS monitoring without any filtering of events applied the cost of the monitoring will be significantly more than running AKS itself.
  
  yodon 2 years ago
  
  I had Azure AKS monitoring turned on for a minuscule, essentially unused hobby project. After about four months the monitoring costs suddenly exploded from about $4/mo to about $4k/mo.
  No idea what happened and MSFT support couldn't tell me what was happening because at more than $100/day burn rate on a hobby project I started deleting everything connected with the effort as fast as possible.
  All I know is my AKS wasn't exploding. Services were still responsive and acting normally in their minuscule cluster, this was just a logging cost explosion.
  Also billing alerts are your friend.
  
  malkia 2 years ago
  
  That might be okay, say you enable all for 10 - 20 seconds, such that all traces are sampled, logs logged, etc, and then it ramps down.
  
  sandermvanvliet 2 years ago
  
  Yeah got burned by that once. Fortunately I had billing alerts in place so we found it quick…
  
  pranay01 2 years ago
  
  This is the default monitoring which comes with AKS setup right?
  
  josephcooney 2 years ago
  
  I'm not sure....I don't think it was turned on by default for AKS. I can do some digging if you want.
- danielodievich 2 years ago
  
  Absolutely. Two of my customers over last two years (a hypergrowth startup and a crypto marketplace) both had 20+MM/year DataDog bills, comparable in magnitude to both their AWS spend (both were built on AWS) and Snowflake spend (which was my area of focus). DataDog's wonderful yet it is pricey and that's why they have that beautiful target on them from all kinds of vendors.
  
  pranay01 2 years ago
  
  Very interesting! I never thought DataDog would be close to AWS spend
  Were these also on AWS Lambda or something else (EKS?)
  
  danielodievich 2 years ago
  
  Both had everything you can possibly get from AWS and then more. I didn't interact with Datadog much except for once loading 4PB of archived DD data into Snowflake to search through it to satisfy govt records request. That was an illuminating project, Datadog can't handle that, but Snowflake sure could.
victor9000 2 years ago

I had the same experience with NR. I couldn't possibly justify paying more for a developer tool than I do for the entirety of operations.

keb_ 2 years ago

I used New Relic for the first time at my current job, and while service it provides to our team is invaluable, the UI is slow and very confusing; I can never remember what button or menu to click to get to where I need to go. The fact that it loads so slowly just makes it worse.

So I am optimistic for alternative solutions like this!

jiggawatts 2 years ago

At $dayjob the devs use both Azure Application Insights and New Relic. Comparing the two, I had the same comments: New Relic is appears useful but is too slow in practice. The developers generally avoid it in their day-to-day work, which defeats the purpose entirely.
We're not in the US, so the cloud-hosted version of New Relic is especially slow because of the added latency of the trans-oceanic network hop.
For comparison, Azure Application Insights can be deployed in our own region. It's not massively faster, but for some UX design reason it feels faster and more pleasant to use. It might be literally just the network latency, and nothing more, but the end result is that it's used more often.
Application Insights isn't perfect by any means:
- Deployment is a PitA and breaks regularly. The documentation related to installing it is confusing, out-of-date, and will guide you down dead ends. For example, I got the profiler component working once on virtual machines, then it broke, and I can't get it working again for the life of me.
- The underlying Log Analytics workspaces are crazy, crazy expensive. They're far more expensive than the competition, which then makes high-level services like Sentinel and App Insights built on top also too expensive for most orgs. For comparison, Log Analytics is about 5x as expensive per GB as AWS CloudWatch logs, and up to 30x as expensive as some other similar services.
If Microsoft just fixed the installer and used reasonable pricing for Log Analytics, the App Insights would be very hard to beat, especially for .NET shops.
I'm hoping open-source tools like SigNoz force down the pricing from "highway robbery" to merely "greedy".
- pranay01 2 years ago
  
  > so the cloud-hosted version of New Relic is especially slow because of the added latency of the trans-oceanic network hop.
  Curious, does the location of the server introduce the latencies (as you mentioned you are not in US)? I would have assumed the latency because of server location would be very small.
  Have you verified that the latency is actually because of the trans-oceanic network hop?
  
  dijit 2 years ago
  
  Consider that each connection is a round trip (IE; hundred+ ms latency is multiplied by 3 due to handshakes being 3-stage.).
  Consider that every fetch of a resource may itself include another resource (IE; a html page which contains a CSS include).
  Now consider that this happens recursively (IE from the above example: a CSS include that itself includes a font or an image).
  It's very easy to get 1+s load times with transatlantic latency alone.
  
  pranay01 2 years ago
  
  thanks for sharing, I never thought this could be an issue
  Question: Does this increase in latency make cloud services less interesting for companies which are not in US? What kind of cloud services will be especially affected
  
  dijit 2 years ago
  
  Cloud services usually have global availability, you choose where you host.
  A lot of people in Europe are using European datacenters.
  That is not to say there's no issues: The consoles can be unbearably slow at times. (Google Cloud being probably the worst offender in my experience, despite being a fan otherwise).
  Amazon supports consoles in other regions, but if you use `console.aws.amazon.com` then it is us-east; it doesn't automatically change it for you.
  Here's the list: https://docs.aws.amazon.com/general/latest/gr/mgmt-console.h...
  Regarding making Cloud less interesting:
  Europe basically follows whatever SV is doing, to make a crass comment: an article could be produced from SV saying eating poop would make better engineering and European "tech" companies would assuredly start buying up the sewage systems.
  Even when it doesn't make sense; we seem to follow.
  
  capableweb 2 years ago
  
  After using AWS for more than 10 years in various capacities, I've never seen that the console is available in multiple regions! Sometimes it is dog slow for me when I'm in a various different geographic locations, I hope this newly learned fact will make my experience slightly better in the future, thanks for sharing that!
  It does make me wonder though, why not automatically redirect people to the console that is closest to them? They could done anything from Anycast DNS to showing a simple little notification showing people there was an alternative possibly closer to them when logging in, but as far as I know, nothing is done about this.
  
  everfrustrated 2 years ago
  
  GP is a little confused. The AWS console always redirects you to the the console hosted in the same region as the resources you're viewing.
  But the initial login request to AWS goes via us-east-1 by default.
  AWS has more recently published the list of login endpoints to use should us-east-1 be offline.
  
  jiggawatts 2 years ago
  
  At least a hundred milliseconds of additional latency is unavoidable across any ocean crossing. It’s just physics.
  All US-hosted web services feel slow here. It’s a baseline sluggishness that permeates everything we use that is cloud hosted.
  The only exception is services that have local instances or replicas of some sort.
  
  pranay01 2 years ago
  
  I see, very interesting.
  Is this latency deterring enough that you prefer running things in your region and not prefer SaaS product which are generally hosted in US/EU?
  Or is this just a discomfort which you deal with?
  
  jiggawatts 2 years ago
  
  Personally speaking, yes. I tend to gravitate to locally hosted services. In some cases it can be a night & day difference.
viraptor 2 years ago

It's actually getting worse than better. Newrelic one is super confusing and the new pricing made us nope out of there and adopt datadog for both metrics and apm. The interface is so much better.
- icelancer 2 years ago
  
  Yeah I agree. It's weird. New Relic was a godsend for us but every iteration of pricing, UI changes, UX... it just gets worse.
  
  bigcat12345678 2 years ago
  
  What's your use case with New Relic? New Relic has been deemed inferior to DataDog across the comments here, wondering how much value you are getting from New Relic nowadays.
  
  icelancer 2 years ago
  
  Haven't used it in awhile after all the changes years back, but it was huge for us in our PHP apps for APM. Can't say I ever used DataDog.
- pranay01 2 years ago
  
  Isn't the new pricing of New Relic much more affordable now? Or they introduce some weird condition on the number of seats, etc
  
  viraptor 2 years ago
  
  Ah yes, sorry, I forgot that was very context specific. With the new plans they killed our very old plan. Which I guess is fair... But still, DD comes out cheaper.
  
  pranay01 2 years ago
  
  got it
Hamuko 2 years ago

There's a reason why they offer a New Relic certificate, so you can prove that you've learned to use the product they're selling.
pranay01 2 years ago

thanks for the kind words. To give it a try and let us know if you have any feedback/questions.
We also have an active slack community if you have any questions on how to set up or have an feedback for us - https://signoz.io/slack

andrewmcwatters 2 years ago

I don't know what other people's experience has been with DataDog, but the user interface experience I had with it was definitely reminiscent of something produced from a dog's rear end.

Definitely a product where no one in the org said "no" to an idea. Felt very Atlassian.

aloknnikhil 2 years ago

Second this. It’s pretty terrible when you use it at scale particularly. Eg: If you have a monitor with some scopes that you use for exceptions, it gets gnarly in the UI when you have like 50 of them. And the inconsistent UI. Notebooks are cool but so half-assed in their features that it kind becomes useless for anything other than basic charts. Oh and feature discovery is a pain too. It took me so long to figure out a way to add a second Y-axis to my charts.
- pranay01 2 years ago
  
  Interesting. do you mean this by scopes - https://docs.datadoghq.com/api/latest/scopes/ ?
  
  aloknnikhil 2 years ago
  
  Not exactly. I’m talking about this workflow. https://www.datadoghq.com/blog/mute-datadog-alerts-planned-d...
  You use specific tags to mute a monitor by adding a scope. This does not scale well with the UI.
  
  pranay01 2 years ago
  
  ah, got it. Thanks
vasco 2 years ago

Been using it for ~5 years full time and the UI is one if the main reasons to use Datadog vs hand rolling it based in Grafana. The productivity of developers creating datadog dashboards vs in other tools was way better.
Have you ever timed how long it takes a normal user to do certain things easily, say during an incident, or you're just crapping on the CSS?
- pranay01 2 years ago
  
  Curious, what areas do you think the DataDog UX is especially better compared to Grafana? Also, which flows do you use the most in DataDog
mustyoshi 2 years ago

I have the opposite opinion. We used Cloudwatch originally but switched to Datadog and it was night and day. Datadog has a beautiful interface for logs, metrics, and dashboarding.
pranay01 2 years ago

Yeah, we have had some of our users mentioning that DataDog has become too bulky and too many bells and whistles.
Curious though, what specific features/UX did you not like in DataDog?
almenon 2 years ago

I have the opposite feeling - to me it looks nice and better than Dynatrace or elastic.
waynesonfire 2 years ago

gatta keep polishing that turd! there is a promo packet that needs filling.

codetrotter 2 years ago

They say:

> if you want to have a seamless experience between metrics and traces, then current experience of stitching together Prometheus & Jaeger is not great.

But I wonder if using Promscale https://github.com/timescale/promscale would make Prometheus & Jaeger not such a big problem as SigNoz imply.

Promscale readme:

> Promscale is a unified metric and trace observability backend for Prometheus, Jaeger and OpenTelemetry built on PostgreSQL and TimescaleDB.

Either way, SigNoz seems interesting indeed. And am glad to see that SigNoz supports OpenTelemetry.

0x457 2 years ago

You don't even need Promscale for this. Grafana has support for displaying information from both plus logs. As long as log messages include TraceID is straightforward use Grafana for debugging.
pranay01 2 years ago

yeah, promscale is an interesting project with similar goal - trying to bring together different signals ( only metrics & traces for promscale) under one hood.
I have not explored the project in detail, but as far as I understand it uses Grafana and Jaeger UI, so I am not sure how seamless is the UI interaction, while SigNoz has a UI built ground up for observability use cases.
ankitnayan 2 years ago

Basically, the choice of DB is different, relational vs analytical DB. IMO clickhouse should be better at ingestion rate and aggregation queries.

ketzu 2 years ago

Is there any introduction to all the services that are included in the docker-compose file [1]?

I wanted to give signoz a try, but the sheer amount of services attached discouraged me, especially as I have to reconfigure them all to work with my setup. (Don't run them directly on the host, instead in a separated network, put the network interface behind traefik, figure out which access they need, provide all the configuration in a nice way without having to clone the full repo just to have the configuration files.)

[1] https://github.com/SigNoz/signoz/blob/develop/deploy/docker/...

rad_gruchalski 2 years ago

Not associated with the project but a quick look suggests: a database (clickhouse), alertmanager, query-service, frontend are signoz components, otel* and hotrod are for distributed tracing. Otel stands for open telemetry (https://opentelemetry.io/), hotrod is a tracing demo app from jaeger: https://github.com/jaegertracing/jaeger/tree/main/examples/h....
Without thinking too much about it, I assume that: hotrod is a demo data source pushing traces to the otel collector, which stores data in clickhouse. Frontend fetches data from clickhouse using the query service. Alert manager probably looks at traces coming in and detects anomalies, so that you can get real-time alerts when things don’t look normal.
- ketzu 2 years ago
  
  Thank you for the quick explanation, that helps a lot and gives me at least a starting point. Although sample services surprise me in the suggested docker-compose from the "how to install" section.
  
  pranay01 2 years ago
  
  yeah, it's not ideal - but we kept it to give users who are installing for the first time to get a hang of the product
  You can easily remove the sample services if you want - https://signoz.io/docs/operate/docker-standalone/#remove-the...
pranay01 2 years ago

The comment by rad_gruchalski is mostly accurate
SigNoz has the following components
- Frontend
- ClickHouse (datastore)
- Alert Manager (this is to monitor metrics and create alerts which you configure in SigNoz)
- Query Service (which is the backend service which talks to datastore & frontend)
- Otel collector ( which the collector provided by opentelemetry to collect telemetry data)
The other two components for sample app which can be commented out
- Hotrod ( which is a sample app)
- load generator
More details here - https://signoz.io/docs/architecture/#architecture-components

reilly3000 2 years ago

I just learned about SigNoz. I spend a lot of time flipping between New Relic and Splunk, and my company spends a ton on both. I’m interested to hear from users and learn if it’s prod ready as a New Relic alternative. I’m especially interested in something that could run locally so I can profile using the same tools that run in prod. Any feedback?

makeavish 2 years ago

Here are few case studies of production users published by SigNoz: https://signoz.io/case-study/

spaceman10 2 years ago

Hi, I want to love this product. This will be the 3rd or 4th time I have looked at the docs to figure out basic setup.

Each time I look into this. I look to see how it can report CPU/Memory/Disk of the systems at large. When I read, all I find on open telemetry and SigNoz is how to integrate with application stacks.

Am I missing something fundamental in that SygNoz/OpenTelemetry do not integrate with host level metrics? I really, really want to use this product. But this extremely BASIC implementation to use the service is 100% missing for documentation as far as I can tell. Even the example page has nothing listed https://signoz.io/docs/tutorials/ ...

So I am either trying to find out how to make SygNoz do something it was not built to do... or the documentation has a huge hole in it.

Let me know how to proceed, if you have time. It's appreciated so that I dont keep walking down the wrong road and hoping to find something useful.

Thanks!

pranay01 2 years ago

Hey, thanks for writing. Our initial focus was on application monitoring - that is why docs are more oriented towards it. But I can understand you may find it tough to understand for infra metrics (CPU/Memory/Disk)
As of now, there are couple of ways to do this: 1. You can enable hostmetrics receiver in opentelemetry collector 2. You can use something like prometheus node exporter and enable prometheus receiver in opentelemetry collector
If you follow this for a VM setup, you should be able to get your host metrics - https://signoz.io/docs/userguide/send-metrics/#enable-a-spec...
If you are on k8s, check out this - https://signoz.io/docs/tutorial/kubernetes-infra-metrics/
If it is still confusing, do drop by in our slack community ( https://signoz.io/slack) We are quite active there and should be able to help you get started
- spaceman10 2 years ago
  
  Many thanks. I took notice of your prometheus mention and had also seen that peppered in the docs for both services (openT and SigN) as a way to gather data.
  As time investment is a dangerous thing I didn't plop myself down for some R&D... But... thanks to your post I'll have another go at it and determine how to get it to hum.
  Thanks again!
  
  pranay01 2 years ago
  
  Awesome! Do let us know if you face any issues, should be fairly simple to do. We are on slack at https://signoz.io/slack if you need any help

jakswa 2 years ago

Goodness I hope this is good because I will make my company gobble it upppp if so.

saintfiends 2 years ago

We thought so too until we found out it doesn't support any kind of SSO
https://github.com/SigNoz/signoz/issues/1188
- bogota 2 years ago
  
  I mean if that is the single thing stopping your company from using it then just add support for it. The cost of DD monthly bill more than supports spending a week or less it would take to add
  
  rad_gruchalski 2 years ago
  
  That or just put the frontend behind a reverse proxy with sso.
  
  hunter2_ 2 years ago
  
  SSO is about making a user account inside a service provider (e.g. TFA) which mirrors that same user account in the identity provider (e.g. Okta). A reverse proxy isn't able to write to the upstream application's user store or otherwise assert the identity of the current user to the upstream application, as far as I'm aware. It could do some kind of binary proxy-or-don't-proxy based on a valid assertion from the IdP, but the application would just attribute all traffic to a single user.
  Or is there some kind of gateway standard that I'm unaware of?
  
  rad_gruchalski 2 years ago
  
  Have a look at something called external, or forward auth. For example 1) Traefik: https://doc.traefik.io/traefik/middlewares/http/forwardauth/, 2) Nginx: https://docs.nginx.com/nginx/admin-guide/security-controls/c..., 3) Envoy: https://www.envoyproxy.io/docs/envoy/latest/configuration/li....
  This can be used to add whatever authn/authz you require to apps that don't even support authn/authz. I'm using Traefik ForwardAuth with Keycloak for Jaeger SSO in a couple of places.
  
  Fiahil 2 years ago
  
  Yes, OAuth-proxy, Nginx external auth, ...
  
  Aeolun 2 years ago
  
  That would be so nice. But if they’re planning to ‘add it to our enterprise plans’ then I doubt your PR would be accepted. Leaving you to manage a fork.
  
  mdaniel 2 years ago
  
  If they truly wanted to build an open source community around their product, the "enterprise" part would just be "we host it for you" and not "we gatekeep features that we think we can extort big companies to pay"
  That is: I wouldn't hold off on a PR just because they said they're going to get around to it; if the PR works, and is merged, that's one less part they have to write. If they don't merge it, then the bad faith you're discussing will be a concrete fact and not speculation, and will serve as a warning to others not to bother submitting more PRs
  
  sheen 2 years ago
  
  Interesting points. Sounds like you've got some sound experience in the politics of open source
pranay01 2 years ago

feel free to test it out. We have an active slack community as well if you have any questions on how to setup etc. https://signoz.io/slack

daigoba66 2 years ago

Curious how it compares to https://github.com/uptrace/uptrace

And are there any other similar projects.

InTheArena 2 years ago

Is there any outlier analysis on this? That's a key advantage of DataDog and NewRelic.

pranay01 2 years ago

By outlier analysis, do you mean anomaly detection in metrics to send alerts?
As of now, we have fixed threshold based alerting capabilities - but more advanced ML/seasonality based anomaly detection is on the roadmap. we are tracking this here - https://github.com/SigNoz/signoz/issues/295
What type of outlier analysis do you generally do in DataDog/NewRelic?
- InTheArena 2 years ago
  
  Week over week, or day over day values that are more then x deviations from norm.

gclawes 2 years ago

How's this compare to the grafana/prometheus/tempo/loki set of tools?

pranay01 2 years ago

You can find answers to some similar questions here - https://signoz.io/docs/faqs/product/#how-is-signoz-different...
Let me know if you have a specific question which is not addressed in the above link and will try to answer

dominotw 2 years ago

good to see solid oss projects coming out of india. I will def keep an eye on this one.

pranay01 2 years ago

thanks. do give it a try and let us know if you have any feedback

tomschwiha 2 years ago

Looks interesting, will have a look at it. Thank you!

pranay01 2 years ago

Awesome! Do give it a try and let us know what do you think.
We also have an active slack community if you have any questions on how to set up or have an feedback for us - https://signoz.io/slack

jaxn 2 years ago

I am hesitant to run my APM on the same infra as our application. I love the idea of reducing an external dependency/cost, and it looks easy enough to add to our Kubernetes. It just seems a little like host our own status page on the same servers as our SaaS.

Is this something you have an answer for?

pranay01 2 years ago

We recommend users to run SigNoz in a separate k8s cluster/VM. In this way, even if your applications servers/clusters get overloaded, your observability stack (SigNoz) will still be running seamlessly.
Many of our users use SigNoz in a similar fashion.

chrisandchris 2 years ago

Question: Is it licensed under MIT? The license reads as it could be MIT but there is no reference to it.

pranay01 2 years ago

The ee/ folder is under proprietary license. Rest of the code is under MIT license
- chrisandchris 2 years ago
  
  Thanks for the response. I've realized that the license between the branch develop and main is not the same at the moment.
ankitnayan 2 years ago

https://github.com/SigNoz/signoz/blob/main/LICENSE
- chrisandchris 2 years ago
  
  Does not match the file I see when clicking "View License", e.g. will not match the LICENSE file the next release may have?
  https://github.com/SigNoz/signoz/blob/develop/LICENSE
  
  pranay01 2 years ago
  
  yes, upcoming release will have license which is currently in `develop` branch

skanga 2 years ago

I tried installation on Windows 10 via Rancher-Desktop using "other platform" docs but ran into issues with dependencies on sh/bash. Are there any windows specific instructions?

pranay01 2 years ago

hey, windows is not officially supported. If you have a Ubuntu machine/Mac, you can try in that
https://signoz.io/docs/install/docker/#prerequisites
- skanga 2 years ago
  
  Gotcha, thanks. Do the linux instructions work on any WSL distro?
  
  thegagne 2 years ago
  
  I got it running on WSL w/ docker using docker compose. I didn’t do much more than get it running though.
  
  pranay01 2 years ago
  
  Awesome! Let us know if you have nay feedback when you get a chance to dive deeper. Here's our slack community- https://signoz.io/slack

mritchie712 2 years ago

How are you dealing with join's in clickhouse? Do you just avoid them all together?

We use clickhouse at Luabase and join performance is the only weak point we've hit.

ankitnayan 2 years ago

Yes...we are avoiding join altogether. Currently we used join in timeseries but we are probably moving away from that due to perf. Single table is amazingly fast

gregwebs 2 years ago

Click house is a great DB but still not the best at storing timeseries metrics. Do you have plans to incorporate a db optimized for time series storage?

RhodesianHunter 2 years ago

Clickhouse is absolutely incredible at storing time series metrics.
- ankitnayan 2 years ago
  
  What does your schema for metrics look like? Using materialised views? I am particularly interested in storing metrics with labels(key/val pairs) Eg, Prometheus metrics. You can't flatten them out into columns due to high dimensionality(can need millions of columns). Do you store the labels in an array?
  
  datalopers 2 years ago
  
  Not the person you're replying to, but you can see how SigNoz did it here [1]
  Essentially you add: keys Array(String), values Array(String) and write the pairs accordingly.
  https://github.com/SigNoz/signoz/blob/develop/deploy/docker-...
datalopers 2 years ago

I see this viewpoint occasionally but have never seen what features a TSDB offers that Clickhouse can’t do just as effectively if not better?
- hiyer 2 years ago
  
  I would have thought better compression was one, but it looks like Clickhouse now supports all the time series-optimized encodings like delta and gorilla - https://altinity.com/blog/2019/7/new-encodings-to-improve-cl....

FridgeSeal 2 years ago

This looks really good!

Might give this a spin next week, currently using NR, but it’s slow, expensive and the in-cluster collection services are frustratingly fragile.

pranay01 2 years ago
thanks, if you have any questions while setting things up, feel free to drop by on our slack community - https://signoz.io/slack
Also, can you explain a bit more on
```
  in-cluster collection services are frustratingly fragile
```
do you mean the agent they use for sending application metrics from clusters breaks down?
- FridgeSeal 2 years ago
  
  > do you mean the agent they use for sending application metrics from clusters breaks down?
  Yeah basically, a number of the kubernetes deployments are configured with too-low memory and resource limits, so they're constantly crashing (and triggering all sorts of alerting false-positives). The limits are semi-hardcoded, so we can't override them, and at the moment, the effort required to fix the deeply-interlinked helm chart they provide isn't worth it. The NRI-Bundle helm chart also installs a lot into your cluster - it brings its own NATS instance, there's multiple daemon sets, etc. I'd likely get quite a lot of spare compute back by swapping to something lighter.

zrosenbauer 2 years ago

Logs be SUPER expensive... our biggest freaking expense

pranay01 2 years ago

what do you use currently for logs?

yawniek 2 years ago

what drove you to move away from MIT licencing?

pranay01 2 years ago

Since we follow an open core business model, we wanted the flexibility to introduce proprietary features in the future. Hence, we added a folder ee/ which will have proprietary code.
The rest of the code is still MIT licensed. And if you remove the ee/ folder, the project would work without any issue
This licensing model is very similar to what folks like Gitlab and PostHog do today
- goodpoint 2 years ago
  
  > open core business model, we wanted the flexibility to introduce proprietary features in the future
  From the website: "Why get locked-in with SaaS vendors like DataDog when you can use Open source?"
  You expect users to trade one form of lock-in for another one?
  
  pranay01 2 years ago
  
  The lock-in we are pointing to is the lock-in because of the proprietary SaaS vendors instrumentation library which is embedded into your code and difficult to get out of.
  We are natively based on opentelemetry which is emerging as the industry standard for instrumentation. So, you can change product you used for backend and visualising and storing your telemetry data very easily
- lhoff 2 years ago
  
  Don't do that. Open Core is going to hurt you. Especially if things like SSO are paid only. I understand you want to make money and you deserve that but open core will hurt you more then it will help you.
  - Open core means no direct path from testing to using (and paying). If I want to make a case for a software to be included in our stack I don't want to constantly run into paywalls while trying out the product but I also don't want to go through the hoops of getting a licence (because of internal bureaucracy).
  - You steer away users that cost you nothing but help you spread to word and report issues and might even provide a PR. I'm talking about OpenSource Projects, student organisations and companies with limited resources (NGOs, early stage startup's). I used to be head admin in a student organisation. We developed our own internal tools and only relied on open-source components. I know of at least two cases where other students got to know the tools and after they finished university joined companies and introduced the very tools we used to their employees resulting in paid support contracts.
  - It creates tempting opportunities for investors to force you into ruining the open source tier. In the beginning you only have features behind the paywall that are useful to big enterprises but if your business is not growing fast enough (from the perspective of investors) they might force you to push more users to be paying customers. That might help in the short term but will ruin your reputation in the long run.
  - Paying for services (aka insurance to get help if something goes haywire) is easier to justify with execs then paying an unreasonable amount for that one feature that is behind the paywall. ("Can't you just make it work without it"). Its a purely psychological argument but decision processes in companies are not always rational and your allies are the devs and you should be helping them make a case to buy your product.
  
  pranay01 2 years ago
  
  If I want to make a case for a software to be included in our stack I don't want to constantly run into paywalls while trying out the product but I also don't want to go through the hoops of getting a licence (because of internal bureaucracy).
  Yeah, understand the use cases you are pointing to. We are planning to introduce a foss only version of the product for users who know that they won't be need the enterprise version. It will not have any enterprise bits, and you could just keep using it as you want. Something similar to this - https://github.com/PostHog/posthog-foss
- goodpoint 2 years ago
  
  What a pity. I could have used it if it was under GPLv3 or AGPL.
  
  bigcat12345678 2 years ago
  
  AGPL is contagious I heard, seems quite dangerous to ppl planning to make preparatory software bolted onto an open source core.
  
  goodpoint 2 years ago
  
  That's wrong. Licenses are not bacteria.
  First, AGPL only requires to release changes ONLY to the existing AGPL codebase and ONLY if you are providing it as a network service.
  Second, the whole idea of virality is a huge misnomer. There is no such thing as one thing "infecting" another in copyright law. GPL/AGPL cannot magically make another piece of software change license.