The revolution will be verbosely {,b}logged

Analyzing Heroku Router Logs with Papertrail

Posted by Jorge Orpinel on

What are some common problems that can be detected with the handy router logs on Heroku? We’ll explore them and show you how to address them easily and quickly with monitoring of Heroku from SolarWinds Papertrail.

One of the first cloud platforms, Heroku is a popular platform as a service (PaaS) that has been in development since June 2007. It allows developers and DevOps specialists to easily deploy, run, manage, and scale applications written in Ruby, Node.js, Java, Python, Clojure, Scala, Go, and PHP.

To learn more about Heroku, head to the Heroku Architecture documentation.

Intro to Heroku Logs

Logging in Heroku is modular, similar to gathering system performance metrics. Logs are time-stamped events that can come from any of the processes running in all application containers (Dynos), system components, or backing services. Log streams are aggregated and fed into the Logplex—a high-performance, real-time system for log delivery into a single channel.

Run-time activity, as well as dyno restarts and relocations, can be seen in the application logs. This will include logs generated from within application code deployed on Heroku, services like the web server or the database, and the app’s libraries. Scaling, load, and memory usage metrics, among other structural events, can be monitored with system logs. Syslogs collect messages about actions taken by the Heroku platform infrastructure on behalf of your app. These are two of the most recurrent types of logs available on Heroku.

To fetch logs from the command line, we can use the heroku logs command. More details on this command, such as output format, filtering, or ordering logs, can be found in the Logging article of Heroku Devcenter.

$ heroku logs
2019-09-16T15:13:46.677020+00:00 app[web.1]: Processing PostController#list (for 208.39.138.12 at 2010-09-16 15:13:46) [GET] 2018-09-16T15:13:46.677902+00:00 app[web.1]: Rendering post/list 2018-09-16T15:13:46.698234+00:00 app[web.1]: Completed in 74ms (View: 31, DB: 40) | 200 OK [http://myapp.heroku.com/] 2018-09-16T15:13:46.723498+00:00 heroku[router]: at=info method=GET path="/posts" host=myapp.herokuapp.com" fwd="204.204.204.204" dyno=web.1 connect=1ms service=18ms status=200 bytes=975

# © 2018 Salesforce.com. All rights reserved.

Heroku Router Logs

Router logs are a special case of logs that exist somewhere between the app logs and the system logs—and are not fully documented on the Heroku website at the time of writing. They carry information about HTTP routing within Heroku Common Runtime, which manages dynos isolated in a single multi-tenant network. Dynos in this network can only receive connections from the routing layer. These routes are the entry and exit points of all web apps or services running on Heroku dynos.

Tail router only logs with the heroku logs -tp router CLI command.

$ heroku logs -tp router
2018-08-09T06:24:04.621068+00:00 heroku[router]: at=info method=GET path="/db" host=quiet-caverns-75347.herokuapp.com request_id=661528e0-621c-4b3e-8eef-74ca7b6c1713 fwd="104.163.156.140" dyno=web.1 connect=0ms service=17ms status=301 bytes=462 protocol=https 2018-08-09T06:24:04.902528+00:00 heroku[router]: at=info method=GET path="/db/" host=quiet-caverns-75347.herokuapp.com request_id=298914ca-d274-499b-98ed-e5db229899a8 fwd="104.163.156.140" dyno=web.1 connect=1ms service=211ms status=200 bytes=3196 protocol=https 2018-08-09T06:24:05.002308+00:00 heroku[router]: at=info method=GET path="/stylesheets/main.css" host=quiet-caverns-75347.herokuapp.com request_id=43fac3bb-12ea-4dee-b0b0-2344b58f00cf fwd="104.163.156.140" dyno=web.1 connect=0ms service=3ms status=304 bytes=128 protocol=https 2018-08-09T08:37:32.444929+00:00 heroku[router]: at=info method=GET path="/" host=quiet-caverns-75347.herokuapp.com request_id=2bd88856-8448-46eb-a5a8-cb42d73f53e4 fwd="104.163.156.140" dyno=web.1 connect=0ms service=127ms status=200 bytes=7010 protocol=https

# Fig 1. Heroku router logs in the terminal

Heroku routing logs always start with a timestamp and the “heroku[router]” source/component string, and then a specially formatted message. This message begins with either “at=info”, “at=warning”, or “at=error” (log levels), and can contain up to 14 other detailed fields such as:

  • Heroku error “code” (Optional) - For all errors and warning, and some info messages; Heroku-specific error codes that complement the HTTP status codes.
  • Error “desc” (Optional) – Description of the error, paired to the codes above
  • HTTP request “method” e.g. GET or POST – May be related to some issues
  • HTTP request “path” – URL location for the request; useful for knowing where to check on the application code
  • HTTP request “host” - Host header value
  • The Heroku HTTP Request ID - Can be used to correlate router logs to application logs;
  • HTTP request “fwd” - X-Forwarded-For header value;
  • Which “dyno” serviced the request - Useful for troubleshooting specific containers
  • “Connect” time (ms) spent establishing a connection to the web server(s)
  • “Service” time (ms) spent proxying data between the client and the web server(s)
  • HTTP response code or “status” - Quite informative in case of issues;
  • Number of “bytes” transferred in total for this web request;

Common Problems Observed with Router Logs

Examples are manually color-coded in this article. Typical ways to address the issues shown above are also provided for context.

Common HTTP Status Codes

404 Not Found Error

Problem: Error accessing nonexistent paths (regardless of HTTP method):

2018-07-30T17:10:18.998146+00:00 heroku[router]: at=info method=POST path="/saycow" host=heroku-app-log.herokuapp.com request_id=e5634f81-ec54-4a30-9767-bc22365a2610 fwd="187.220.208.152" dyno=web.1 connect=0ms service=15ms status=404 bytes=32757 protocol=https
2018-07-27T22:09:14.229118+00:00 heroku[router]: at=info method=GET path="/irobots.txt" host=heroku-app-log.herokuapp.com request_id=7a32a28b-a304-4ae3-9b1b-60ff28ac5547 fwd="187.220.208.152" dyno=web.1 connect=0ms service=31ms status=404 bytes=32769 protocol=https

Solution: Implement or change those URL paths in the application or add the missing files.

500 Server Error

Problem: There’s a bug in the application:

2018-07-31T16:56:25.885628+00:00 heroku[router]: at=info method=GET path="/" host=heroku-app-log.herokuapp.com request_id=9fb92021-6c91-4b14-9175-873bead194d9 fwd="187.220.247.218" dyno=web.1 connect=0ms service=3ms status=500 bytes=169 protocol=https

Solution: The application logs have to be examined to determine the cause of the internal error in the application’s code. Note that HTTP Request IDs can be used to correlate router logs against the web dyno logs for that same request.

Common Heroku Error Codes

Other problems commonly detected by router logs can be explored in the Heroku Error Codes. Unlike HTTP codes, these error codes are not standard and only exist in the Heroku platform. They give more specific information on what may be producing HTTP errors.

H14 - No web dynos running

Problem: App has no web dynos setup:

2018-07-30T18:34:46.027673+00:00 heroku[router]: at=error code=H14 desc="No web processes running" method=GET path="/" host=heroku-app-log.herokuapp.com request_id=b8aae23b-ff8b-40db-b2be-03464a59cf6a fwd="187.220.208.152" dyno= connect= service= status=503 bytes= protocol=https

Notice that the above case is an actual error message, which includes both Heroku error code H14 and a description. HTTP 503 means “service currently unavailable.”

Note that Heroku router error pages can be customized. These apply only to errors where the app doesn’t respond to a request e.g. 503.

Solution: Use the heroku ps:scale command to start the app’s web server(s).

H12 - Request timeout

Problem: There’s a request timeout (app takes more than 30 seconds to respond):

2018-08-18T07:11:15.487676+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=GET path="/sleep-30" host=quiet-caverns-75347.herokuapp.com request_id=1a301132-a876-42d4-b6c4-a71f4fe02d05 fwd="189.203.188.236" dyno=web.1 connect=1ms service=30001ms status=503 bytes=0 protocol=https

Error code H12 indicates the app took over 30 seconds to respond to the Heroku router.

Solution: Code that requires more than 30 seconds must run asynchronously (e.g., as a background job) in Heroku. For more info read Request Timeout in the Heroku DevCenter.

H18 - Server Request Interrupted

Problem: The Application encountered too many requests (server overload):

2018-07-31T18:52:54.071892+00:00 heroku[router]: sock=backend at=error code=H18 desc="Server Request Interrupted" method=GET path="/" host=heroku-app-log.herokuapp.com request_id=3a38b360-b9e6-4df4-a764-ef7a2ea59420 fwd="187.220.247.218" dyno=web.1 connect=0ms service=3090ms status=503 bytes= protocol=https

Solution: This problem may indicate that the application needs to be scaled up, or the app performance improved.

H80 - Maintenance mode

Problem: Maintenance mode generates an info router log with error code H18:

2018-07-30T19:07:09.539996+00:00 heroku[router]: at=info code=H80 desc="Maintenance mode" method=GET path="/" host=heroku-app-log.herokuapp.com request_id=1b126dca-1192-4e98-a70f-78317f0d6ad0 fwd="187.220.208.152" dyno= connect= service= status=503 bytes= protocol=https

Solution: Disable maintenance mode with heroku maintenance:off

Papertrail

Papertrail™ is a cloud log management service designed to aggregate Heroku app logs, text log files, and syslogs, among many others, in one place. It helps you to monitor, tail, and search logs via a web browser, command-line, or an API. The Papertrail software analyzes log messages to detect trends, and allows you to react instantly with automated alerts.

The Event Viewer is a live aggregated log tail with auto-scroll, pause, search, and other unique features. Everything in log messages is searchable, and new logs still stream in real time in the event viewer when searched (or otherwise filtered). Note that Papertrail reformats the timestamp and source in its Event Viewer to make it easier to read.

Viewer Live Pause
Fig 2. The Papertrail Event Viewer. © 2018 Solarwinds. All rights reserved.

Provisioning Papertrail on your Heroku apps is extremely easy: heroku addons:create papertrail from terminal. (See the Papertrail article in Heroku’s DevCenter for more info.) Once setup, the add-on can be open from the Heroku app’s dashboard (Resources section) or with heroku addons:open papertrail in terminal.

Troubleshooting Routing Problems Using Papertrail

A great way to examine Heroku router logs is by using the Papertrail solution. It’s easy to isolate them in order to filter out all the noise from multiple log sources: simply click on the “heroku/router” program name in any log message, which will automatically search for “program:heroku/router” in the Event Viewer:

Heroku router viewer
Fig 3. Tail of Heroku router logs in Papertrail, 500 app error selected. © 2018 SolarWinds. All rights reserved.

Monitor HTTP 404s

How do you know that your users are finding your content, and that it’s up to date? 404 Not Found errors are what a client receives when the URL’s path is not found. Examples would be a misspelled file name or a missing app route. We want to make sure these types of errors remain uncommon, because otherwise, users are either walking to dead ends or seeing irrelevant content in the app!

With Papertrail, setting up an alert to monitor the amount of 404s returned by your app is easy and convenient. One way to do it is to search for “status=404” in the Event Viewer, and then click on the Save Search button. This will bring up the Save Search popup, along with the Save & Setup Alert option:

Save a search
Fig 4. Save a log search and set up an alert with a single action © 2018 SolarWinds. All rights reserved.

The following screen will give us the alert delivery options, such as email, Slack message, push notifications, or even publish all matching events as a custom metric for application performance management tools such as AppOptics™.

Troubleshoot 500 errors quickly

500 error on Heroku
Fig 5. HTTP 500 Internal Server Error from herokuapp.com. © 2018 Google LLC. All rights reserved.

Let’s say an HTTP 500 error is happening on your app after it’s deployed. A great feature of Papertrail is to make the request_id in log messages clickable. Simply click on it or copy it and search it in the Event Viewer to find all the app logs that are causing the internal problem, along with the detailed error message from your application’s code.

Conclusion

Heroku router logs are the glue between web traffic and (sometimes intangible) errors in your application code. It makes sense to give them special focus when monitoring a wide range of issues because they often indicate customer-facing problems that we want to avoid or address ASAP. Add the Papertrail addon to Heroku to get more powerful ways to monitor router logs.

Sign up for a 30-day free trial of Papertrail and start aggregating logs from all your Heroku apps and other sources. You may learn more about the Papertrail advanced features in its Heroku Dev Center article.

Papertrail for Python Logs

Posted by Greg Sandler on

When you’re troubleshooting a problem or tracking down a bug in Python, the first place to look for clues related to server issues is in the application log files.

Python includes a robust logging module in the standard library, which provides a flexible framework for emitting log messages. This module is widely used by various Python libraries and is an important reference point for most programmers when it comes to logging.

The Python logging module provides a way for applications to configure different log handlers and provides a standard way to route log messages to these handlers. As the Python.org documentation notes, there are four basic classes defined by the Python logging module: Loggers, Handlers, Filters, and Formatters. We’ll provide more details on these below.

Getting Started with Python Logs

There are a number of important steps to take when setting up your logs. First, you need to ensure logging is enabled in the applications you use. You also need to categorize your logs by name so they are easy to maintain and search. Naming the logs makes it easier to search through large log files, and to use filters to find the information you need.

To send log messages in Python, request a logger object. It should have a unique name to help filter and prioritize how your Python application handles various messages. We are also adding a StreamHandler to print the log on our console output. Here’s a simple example:

import logging
logging.basicConfig(handlers=[logging.StreamHandler()])
log = logging.getLogger("test")
log.error("Hello, world")

This outputs:

ERROR:test:Hello, world

This message consists of three fields. The first, ERROR, is the log level. The second, test, is the logger name. The third field, “Hello, world”, is the free-form log message.

Most problems in production are caused by unexpected or unhandled issues. In Python, such problems generate tracebacks where the interpreter tries to include all important information it could gather. This can sometimes make the traceback a bit hard to read, though. Let’s look at an example traceback. We’ll call a function that isn’t defined and examine the error message.

def test():
    nofunction()
test()

Which outputs:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in test
NameError: global name 'nofunction' is not defined

This shows the common parts of a Python traceback. The error message is usually at the end of the traceback. It says “nofunction is not defined,” which is what we expected. The traceback also includes the lines of all stack frames that were touched when this error occurred. Here we can see that it occurred in the test function on line two. Stdin means standard input and refers to the console where we typed this function. If we were using a Python source file, we’d see the file name here instead.

Configuring Logging

You should configure the logging module to direct messages to go where you want them. For most applications, you will want to add a Formatter and a Handler to the root logger. Formatters let you specify the fields and timestamps in your logs. Handlers let you define where they are sent. To set these up, Python provides a nifty factory function called basicConfig.

import logging

logging.basicConfig(format='%(asctime)s %(message)s',
                  handlers=[logging.StreamHandler()])
logging.debug('Hello World!')

By default, Python will output uncaught exceptions to your system’s standard error stream. Alternatively, you could add a handler to the excepthook to send any exceptions through the logging module. This gives you the flexibility to provide custom formatters and handlers. For example, here we log our exceptions to a log file using the FileHandler:

import logging
import sys

logger = logging.getLogger('test')
fileHandler = logging.FileHandler("errors.log")
logger.addHandler(fileHandler)

def my_handler(type, value, tb):
  logger.exception("Uncaught exception: {0}".format(str(value)))

# Install exception handler
sys.excepthook = my_handler

# Throw an error
nofunction()

Which results in the following log output:

$ cat errors.log
Uncaught exception: name 'nofunction' is not defined
None

In addition, you can filter logs by configuring the log level. One way to set the log level is through an environment variable, which gives you the ability to customize the log level in the development or production environment. Here’s how you can use the LOGLEVEL environment variable:

$ export LOGLEVEL='ERROR'
$ python
>>> import logging
>>> logging.basicConfig(handlers=[logging.StreamHandler()])
>>> logging.debug('Hello World!') #prints nothing
>>> logging.error('Hello World!')
ERROR:root:Hello World!

Logging from Modules

Modules intended for use by other programs should only emit log messages. These modules should not configure how log messages are handled. A standard logging best practice is to let the Python application importing and using the modules handle the configuration.

Another standard best practice to follow is that each module should use a logger named like the module itself. This naming convention makes it easy for the application to distinctly route various modules and helps keep the log code in the module simple.

You need just two lines of code to set up logging using the named logger. Once you do this in Python, the “ name” contains the full name of the current module, and will work in any module. Here’s an example:

import logging
log = logging.getLogger(__name__)
def do_something():
    log.debug("Doing something!")

Analyzing Your Logs with Papertrail

Python applications on a production server contain millions of lines of log entries. Command line tools like tail and grep are often useful during the development process. However, they may not scale well when analyzing millions of log events spread across multiple servers.

Centralized logging can make it easier and faster for developers to manage a large volume of logs. By consolidating log files onto one integrated platform, you can eliminate the need to search for related data that is split across multiple apps, directories, and servers. Also, a log management tool can alert you to critical issues, helping you more quickly identify the root cause of unexpected errors, as well as bugs that may have been missed earlier in the development cycle.

For production-scale logging, a log management tool such as Solarwinds® Papertrail™ can help you better manage your data. Papertrail is a cloud-based platform designed to handle logs from any Python application, including Django servers.

The Papertrail solution provides a central repository for event logs. It helps you consolidate all of your Python logs using syslog, along with other application and database logs, giving you easy access all in one location. It offers a convenient search interface to find relevant logs. It can also stream logs to your browser in real time, offering a “live tail” experience. Check out the tour of the Papertrail solution’s features.

Papertrail is designed to help minimize downtime. You can receive alerts via email, or send them to Slack, Librato, PagerDuty, or any custom HTTP webhooks of your choice. Alerts are also accessible from a web page that enables customized filtering. For example, you can filter by name or tag.

Configuring Papertrail in Your Application

There are many ways to send logs to Papertrail depending on the needs of your application. You can send logs through journald, log files, Django, Heroku, and more. We will review the syslog handler below.

Python can send log messages directly to Papertrail with the Python SysLogHandler. Just set the endpoint to the log destination shown in your Papertrail settings. You can optionally format the timestamp or set the log level as shown below.

import logging
import socket
from logging.handlers import SysLogHandler

syslog = SysLogHandler(address=('logsN.papertrailapp.com', XXXXX))
format = '%(asctime)s YOUR_APP: %(message)s'
formatter = logging.Formatter(format, datefmt='%b %d %H:%M:%S')
syslog.setFormatter(formatter)

logger = logging.getLogger()
logger.addHandler(syslog)
logger.setLevel(logging.INFO)

def my_handler(type, value, tb):
  logger.exception("Uncaught exception: {0}".format(str(value)))

# Install exception handler
sys.excepthook = my_handler

logger.info("This is a message")
nofunction() #log an uncaught exception

Conclusion

Python offers a well-thought-out framework for logging that makes it simple to enable and manage your log files. Getting started is easy, and a number of tools baked into Python automate the logging process and help ensure ease of use.

Papertrail adds even more functionality and tools for diagnostics and analysis, enabling you to manage your Python logs on a centralized cloud server. Quick to setup and easy to use, Papertrail consolidates your logs on a safe and accessible platform. It simplifies your ability to search log files, analyze them, and then act on them in real time—so that you can focus on debugging and optimizing your applications.

Learn more about how Papertrail can help optimize your development ecosystem.

Best Practices for Logging in Node.js

Posted by Gregory Sandler on

Logging is one of those Node.js functions that’s easy to take for granted — until some unexpected event results in hours, sometimes days, of searching through log files to pinpoint the source of a problem.

Node.js apps create basic text logs just like any other application. These logs can be easily generated using open source logging libraries, such as the Winston or Bunyan libraries. These libraries work fine for managing relatively small applications, but developers creating massive amounts of logs on a production server need to be able to quickly and efficiently troubleshoot infrastructure and application issues.

Logging Best Practices

There are a number of best practices to follow when setting up your logs. First, you should ensure logging is enabled in the application you are using. Below we will show you how you can turn on logging using popular libraries such as Winston and Bunyan.

Next, it’s important to categorize your logs so they are easy to maintain and search. Categorization can facilitate the process of searching through large log files, and filtering for the information you need. You’ll see some examples of how to do this using groups and attributes.

You should also log uncaught exceptions because if an event fires and there is no error listener, it propagates to the Node.js process and will print the stack trace and crash it. To avoid this, you can listen to an uncaught exception event on the process object or configure your logging framework to handle it. We’ll also show you how you can set this up later.

Node applications on a production server often contain millions of lines of log entries. When you need to troubleshoot for any reason, it can be a daunting task. During the development process, command line tools provide an easy interface for console-level log review. But analyzing large volumes of data on a production server requires an automated process that enables you to easily query and analyze your logs.

For production-scale logging, you need a robust set of automated troubleshooting tools that allows you to go beyond console-level review. In a production environment, it’s important to maintain a central log file and/or stream your logs to a management tool. In addition to notifying you about new issues, a log management tool can help you identify the source of unexpected errors, as well as bugs that may have been missed earlier in the development cycle.

The tools best-suited for your needs will be influenced by the way in which you manage your data. If your data is in an Apache log format such as Morgan logs, you’ll be better off with a management tool that can understand and search the Apache format natively. If the data is in JSON format, you need an analysis tool that understands JSON.

Use a Logging Library to Add Functionality

Popular node log management solutions, such as Bunyan and Winston, are designed to add functionality to the logging process. For example, you can format JSON output using Bunyan, a well-tested tool for node logging. Bunyan lets you create and format JSON versions of your logs. This is especially useful if you plan to use external services, such as Papertrail™, to manage, read, and analyze your logs.

Bunyan is a fast and easy-to-use logging module that includes a process id (pid), hostname, and timestamp when sending log messages. You can use Bunyan for custom transports in order to log data to various locations, such as Syslog daemons, Slack notifications, and rotating files.

You also can use Bunyan to log based on priority level. This facilitates the creation of custom loggers, which it refers to as child loggers, and allows for log customization. For example, this will enable you to log different information in your development and production environments.

To use Bunyan, create and name a Logger. You can use Bunyan’s logging API to customize the output format you prefer.

var bunyan = require('bunyan');
var log = bunyan.createLogger({name: 'papertrail-test'});
log.info('Welcome to the app'); // Just text
log.warn({port: bind}, 'port'); // Text plus metadata

This example, using Bunyan, demonstrates how the log level can be encoded using the number in the “level” property of the resulting JSON object.

var bunyan = require('bunyan');
var log = bunyan.createLogger({name: 'play', level: 'debug'});
log.info('<i>Book ‘Catch 22’ has been added to the database</i>');

It will produce the following log entry:

{
  "name": "papertrail",
  "hostname": "mainserv",
  "pid": 32411,
  "level": 30,
  "msg": "<i>Book ‘Catch 22’ has been added to the database</i>",
  "time": "2018-05-19T23:57:12.112Z",
  "v": 0
} 

Another logging option that can be particularly useful for handling complex application server logs is Winston. Using Winston, you can define and customize your logs. This includes defining destinations, adding metadata, and maintaining multiple asynchronous logs.

Winston is one of the most popular logging libraries available for Node.js. It has a large and robust feature set that is easy to install, configure, and use. Among its many features are the ability to use multiple transports, create custom transports, stream logs, and query logs.

For instance, with Winston you can define a custom logger that includes a timestamp, uses uppercase for the logging level, and adds specific metadata you define in a log message. Here are some examples of this process in action:

// Create custom logger
 
var logger = new(winston.Logger)({
  transports: [
    new(winston.transports.Console)({
      timestamp: function() {
        return Date.now();
      },
      formatter: function(options) {
        return options.timestamp() + ' ' + options.level.toUpperCase() +
          ' ' + (options.message ? options.message : '') +
          (options.meta && Object.keys(options.meta).length ? '\n\t' +
          JSON.stringify(options.meta) : '');
      }
    })
  ]
});

The following is an example of log level encoding using Winston:

var winston = require('winston');
winston.level = 'debug';
winston.log('info', '<i>Book ‘Catch 22’ has been added to the database</i>');

It will produce the following log:

info: <i>Book ‘Catch 22 has been added to database</i>

You can also log uncaught exceptions by configuring Winston when you create the logger.

const logger = createLogger({
  transports: [
    new transports.File({ filename: 'combined.log' }) 
  ],
  exceptionHandlers: [
    new transports.File({ filename: 'exceptions.log' })
  ]
});

Troubleshoot Problems Fast Using Papertrail

Cloud-hosted monitoring solutions are an increasingly popular option for production-scale log management, aggregation, and search. SolarWinds® Papertrail is a cloud-hosted log management service, which is designed to integrate with popular logging library solutions and help make it easy for you to consolidate all of your Node.js logs in one place.

Papertrail is intended to help you centralize your log management in the cloud. You can quickly and easily use the Papertrail solution to configure centralized logging from Node.js apps. Papertrail can handle logs directly from the Node.js app using a regular text log file, or by integrating with logging libraries, such as Winston or Bunyan.

Papertrail is purpose-built to provide instant log visibility. It can typically be set up in less than a minute, and provides features such as fast search, integration with multiple alerting services, flexible system groups, team-wide access, and webhook monitoring.

In addition to functionality, Papertrail may add value to the log data you already collect. It is designed to make it easier for you to identify error messages, app requests, slow DB queries, config changes, and application anomalies.

Papertrail features numerous benefits for handling Node log files, including:

Log aggregation. Papertrail allows you to consolidate all of your Node logs with syslogs, and other application and database logs, giving you easy access to application syslogs, text logs, and Apache logs — all in one location.

Tail and search logs. With Papertrail, you can search stored or streaming log data using search query syntax, which can help you quickly troubleshoot issues. You also can search large volumes of log messages using the robust search capabilities built-in to Papertrail. The Papertrail GUI can also expedite the process of identifying mission-critical issues that may need to be addressed in a production environment.

Instant alert notifications. Papertrail is designed to help minimize downtime. You can receive alerts via email, or send them to Slack, Librato, PagerDuty (or any custom HTTP webhooks of your choice.) Moreso, alerts can be accessed from a web page that enables customized filtering. For example, you can filter by name or tag.

Log analysis. Papertrail log archives can be loaded into third-party data management utilities, such as Redshift or Hadoop, to enable even more robust queries and analytics. With the Papertrail built-in log velocity analytics, you can visualize log throughput to quickly identify patterns and anomalies.

Scalability. You can also use Papertrail to scale to your log volume and desired searchable duration.

Encryption. Papertrail facilitates encrypted logs, which adds another layer of application security in the production environment. In addition, Papertrail supports optional TLS encryption and certificate-based destination host verification.

Configuring Node to send logs to Papertrail

Getting up and running with Papertrail is quick and easy. If you already have log files, you can transmit them to Papertrail using remote_syslog2. This small program will monitor the log files and send new lines to Papertrail.

> remote_syslog -D --tls -p 1234 /var/log/mysqld.log

If you prefer to use Winston, you can also send events asynchronously from Node.js using the Winston Papertrail transport. Update the host and port to the ones given to you in your Papertrail log destination settings.

require('winston-papertrail').Papertrail;
var winstonPapertrail = new winston.transports.Papertrail({
  host: 'logs.papertrailapp.com',
  port: 12345
});

var logger = new winston.Logger({
  transports: [winstonPapertrail]
});

If you prefer to use Bunyan, you can use the node-bunyan-syslog stream. Update the host and port to the ones given to you in your Papertrail log destination settings.

var bsyslog = require('bunyan-syslog');
var log = bunyan.createLogger({
  name: 'foo',
  streams: [{
    level: 'debug',
    type: 'raw',
    stream: bsyslog.createBunyanStream({
      type: 'sys',
      facility: bsyslog.local0,
      host: 'logs5.papertrailapp.com',
      port: 59738
    })
  }]
});

Conclusion

Papertrail goes beyond giving you centralized management of your logs. It can enable you to see more value from the log data you already collect. This is a log management tool designed to help you troubleshoot customer problems, resolve error messages, improve slow database queries, and implement config changes. Moreso, Papertrail can give you analytical tools to identify and resolve system anomalies and potential security issues.

Learn more about how Papertrail can give you frustration-free log management in the cloud and sign up for a trial or the free plan to get started.

The Papertrail trademarks, service marks, and logos are the exclusive property of Papertrail, Inc. or its affiliates. All other trademarks are the property of their respective owners.

Troubleshooting Kubernetes Using Logs

Posted by Andre Newman on

Maintaining a Kubernetes cluster is an ongoing challenge. While it tries to make managing containerized applications easier, it introduces several layers of complexity and abstraction. A failure in any one of these layers could result in crashed applications, resource overutilization, and failed deployments.

Fortunately, Kubernetes keeps comprehensive logs of cluster activity and application output. Kubernetes logging provides valuable insight into how containers, nodes, and Kubernetes itself is performing. Meticulously logging everything from Pods to ReplicaSets, it allows you to trace problems back to their source.

In this post, we’ll look at some common Kubernetes problems and how we can use logs to troubleshoot them.

Types of Logs

Kubernetes maintains two types of logs: application logs and cluster logs. Pods generate application logs, and include application code running in a container. The Kubernetes engine and its components, such as the kubelet agent, API server, and node scheduler, generate cluster logs.

Deployment Errors

Kubernetes Deployments can fail for a number of reasons, and the effects can be seen immediately. The cluster log documents most deployment-related errors.

Invalid Images

Using the wrong image in a Pod declaration can prevent an entire Deployment from completing successfully. An image error can be as simple as a misspelled image name or tag, or it could indicate a failure to download the image from a registry.

For example, let’s try to create an Nginx Pod:

$ kubectl run nginx --image=nginx:0.10

While there is an Nginx image on Docker Hub, there is no image with the “0.10” tag. kubectl runs successfully, but the Pod shows a status of ImagePullBackOff:

$ kubectl get pods
Name Ready Status Restarts AGE
nginx-cbdb859bb-gn2h8 0/1 ImagePullBackOff 0 01m

Kubernetes cluster logging has more information about the error:

Jul 23 12:33:43 minikube kubelet: E0723 16:33:42.826691 2740 pod_workers.go:186] Error syncing pod 2771d596-8e96-11e8-a08e-080027ced1d7 ("nginx-678c57c5cc-jcs5d_default(2771d596-8e96-11e8-a08e-080027ced1d7)"), skipping: failed to "StartContainer" for "nginx" with ImagePullBackOff: "Back-off pulling image \"nginx:0.10\""
...
Jul 23 12:34:24 minikube kubelet: E0723 16:34:24.472171 2740 remote_image.go:108] PullImage "nginx:0.10" from image service failed: rpc error: code = Unknown desc = Error response from daemon: manifest for nginx:0.10 not found

Private registries are more complicated, since each node in your cluster must authenticate with the registry before pulling images. For example, let’s upload a modified Alpine Linux image to Google Container Registry. Although our nodes can reach the registry, they aren’t authorized to pull images from it. As with Nginx, the Alpine Pod is created, but its status is ImagePullBackOff and an error appears in the cluster log:

Jul 23 11:34:27 minikube kubelet: E0723 15:34:26.869958 2740 pod_workers.go:186] Error syncing pod f5e6d267-8e88-11e8-a08e-080027ced1d7 ("my-alpine_default(f5e6d267-8e88-11e8-a08e-080027ced1d7)"), skipping: failed to "StartContainer" for "my-alpine" with ImagePullBackOff: "Back-off pulling image \"gcr.io/troubleshooting-k8s/alpine\""

Insufficient Resources

Kubernetes tries to schedule Pods in a way that optimizes CPU and RAM usage, but once a resource is exhausted across the cluster, nodes can start to become unstable. New Pods can no longer be deployed, and Kubernetes will start evicting existing Pods. This can cause significant problems for applications, nodes, and the cluster itself.

To demonstrate this, we used the docker-stress container to exhaust RAM on our node. We configured the Pod to use 4 GB of RAM, then replicated it eight times:

# stress.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress
spec:
  selector:
  matchLabels:
    app: stress
  replicas: 8
  template:
  metadata:
      labels:
          app: stress
  spec:
      containers:
      - name: stress
          image: progrium/stress:latest
          imagePullPolicy: Always
          args: ["--vm", "1", "--vm-bytes", "4096M"]

$ kubectl create -f stress.yaml

This initially caused several components to time out, including the controller itself:

Jul 26 11:11:11 kube-controller-manager kube-system[kube-controller-manager-minikube]: E0726 15:11:11.869623 1 replica_set.go:450] Sync "default/stress-66479cfcf7" failed with the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.extensions stress-66479cfcf7)

As node resources become exhausted, Kubernetes will try rescheduling Pods to higher capacity nodes. If none are available, Kubernetes will start to evict Pods, which places Pods in the Pending state. You can prevent this by setting Pod resource limits, reducing the number of ReplicaSets, or by increasing the cluster’s capacity.

Operational Errors

A successful deployment doesn’t always mean your application is working. Errors can still occur long after the deployment is complete. These errors often occur within Pods and are recorded in their respective Pod’s log file.

Application Errors

Kubernetes automatically collects logs that containerized applications print to stdout and stderr. Each event is logged to a Pod-specific file, which can be accessed using kubectl logs.

For example, let’s run a “hello world” Python script with a typo in the print function:

# script.py
print("Hello world!")
# Dockerfile
FROM python:3.7.0-stretch
COPY script.py .
CMD ["python", "./script.py"]
$ docker build -t python-example Dockerfile
# python.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-example
spec:
  selector:
  matchLabels:
    app: python-example
  replicas: 1
  template:
  metadata:
      labels:
          app: python-example
  spec:
      containers:
        - name: python-example
          image: python-example:latest
          imagePullPolicy: IfNotPresent
$ kubectl create -f python.yaml

The Pod builds and runs without any issues, but the application fails with a status of Error. Viewing the Pod’s logs shows why:

Jul 23 13:05:17 python-example default[python-example-6fbdc7bd7c-6wqvn]: Traceback (most recent call last):
...
Jul 23 13:05:17 python-example default[python-example-6fbdc7bd7c-6wqvn]: NameError: name 'prnt' is not defined

No Network Access

Kubernetes uses a powerful overlay network with its own DNS service. It automatically assigns Pods a DNS record based on their name and the namespace where they were created. For example, a Pod named my-pod in the namespace kube-cluster can be accessed from another Pod using kube-cluster.my-service.

If a Pod can’t access another Pod, the DNS service might be failing. The DNS service itself runs as a Pod, which you can access using kubectl:

$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                      READY   STATUS  RESTARTS   AGE
kube-dns-86f4d74b45-msw2f   3/3     Running   29      5d

This Pod hosts three containers: kubedns monitors the Kubernetes master for changes to Services and Endpoints; dnsmasq adds caching; and sidecar performs health checks on the service. For example, if the dnsmaq container is unavailable, the sidecar container will print a message like the following:

Jul 26 11:18:14 sidecar kube-system[kube-dns-86f4d74b45-msw2f]: W0726 15:18:14.145436       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:53159->127.0.0.1:53: read: connection refused

Poor Node Health

Kubernetes has a built-in API for monitoring node events called the node problem detector. This deploys a Pod to each node that scans the node’s kernel log for new events.

$ kubectl create -f https://k8s.io/examples/debug/node-problem-detector.yaml
$ kubectl logs --name-space=kube-system node-problem-detector-v0.1-xkhw9
I0725 19:05:31.119316     1 kernel_monitor.go:93] Got system boot time: 2018-07-25 19:03:48.119309654 +0000 UTC
I0725 19:05:31.121793     1 kernel_monitor.go:102] Start kernel monitor

The detector currently won’t take any action in response to events. However, it’s an effective tool for detecting and reporting changes to nodes.

Monitoring Kubernetes Logs in Papertrail

Although kubectl lets you view logs, it doesn’t provide an easy way to centralize or monitor them. With the Papertrail™ solution, you can access logs from across your cluster from a single location, stream events in real-time, or filter events by component, resource, or date range. You can also set up alerts to automatically notify you in case of unexpected problems or unusual behavior.

Sending your Kubernetes logs to Papertrail is easy. With the logspout DaemonSet, Pod logs and master component logs are automatically forwarded with no additional setup. Each event is appended with the resource name (e.g. the Pod or container name) appearing in the System field, and the component that logged it appearing in the Program field.

Creating Alerts

Alerts can provide instant notifications when your cluster displays unusual behavior.

For example, you might want to be notified as soon as the node problem detector reports a new kernel event. You can create a search that filters your logs to those originating from node-problem-detector, then create an alert that triggers on each new event.

Conclusion

Kubernetes logs are rich with useful information about the state of your applications and nodes. Sending these logs to Papertrail can make it easy to monitor, filter, and search through events and troubleshoot problems faster. Sign up for a free trial of Papertrail and start sending your Kubernetes logs to Papertrail today.

The New Papertrail Dashboard

Posted by @coryduncan on

We’ve recently rolled out several long-overdue improvements to the Papertrail™ dashboard. The old dashboard was adequate, but it didn’t do a great job of showing larger accounts with many groups, searches, or systems. Lack of personalization was another issue. For example, it assumed every user cared equally about every group in the account. The new dashboard addresses these problems with a flexible design that is designed to tailor to individual preferences.

Favorite groups

As the number of groups increases, it’s difficult to find the specific groups that interest you. With favorite groups, you can now tailor your dashboard to only show those groups.

Favorite a group by toggling the star icon next to the group name:

Traffic lights

Last year we released a feature called “traffic lights” - a visual indicator of event status for each system. Now we brought something similar to the dashboard, showing a traffic light for each group. A yellow or red light means the entire group has been inactive for one hour or one day, respectively. This can be used to ensure a group of systems are still logging after a deployment or no longer logging after being decommissioned.

Grid / List toggle

There are two ways to view groups on the dashboard - as a grid or list. The grid view shows expanded information and is enabled by default. The list view, on the other hand, shows a concise view, which can be useful when comparing group attributes for a large number of groups.

Log data transfer usage bar

Being aware of your account’s log data transfer usage is important because each plan includes a specific usage limit. When usage exceeds or gets close to the plan limit, it may be time to add event filters or change plans. Log data transfer usage has always been available in settings, but is now easier to spot at the bottom of the dashboard.

Responsive UI

The new dashboard is designed to work well on any screen size. Use the dashboard from your desktop, mobile device, or anything in between.

We hope you enjoy the new dashboard experience. If you have any feedback, please send it our way.

Get Insights into Azure with Papertrail

Posted by Jennifer Marsh on

When your infrastructure doesn’t offer the scalability to add hardware and applications without huge monetary investments, you can turn to cloud hosting. Microsoft Azure caters to businesses with mainly Windows environments and hosting can be difficult to monitor as you scale up resources. As you add more VMs and applications to your cloud, you may struggle to keep track of logs across the entire network. Every time you create a new VM, upload a new application, develop a new website, build a new database, or any other new resource, Azure produces a variety of logs stored in different locations. This can make it difficult to find the necessary information for monitoring your services or troubleshooting problems.

Later, we’ll show you how, with Papertrail™, you can stream your application logs directly to a central location. Its aggregated Event Viewer offers targeted monitoring, searches, and live tail functionality.

The Many Types of Azure Logs

Azure creates a number of logs depending on the resource you make. When you develop a new database, for instance, Azure generates activity and diagnostic logs that monitor changes you make from your Azure portal. You can read more about monitoring and logging activities in Azure here.

Additionally, with every installation of Windows, Event Viewer is included in the operating system to monitor events. Failed logins, security changes, system changes, and application events can be reviewed in Event Viewer. Whenever you host a website or deploy a .NET application to Azure, you have application logs you need to monitor. Monitoring logs from just one application doesn’t require much effort, but when you accumulate dozens across a variety of applications, Papertrail can help you aggregate and stream them to one centralized location.

Azure Log Analysis Tools

When you’re creating an application in Azure, Azure produces activity logs and makes a diagnostic interface available for review from your Azure portal in the Activity Log section of the resource group. The activity log interface provides a general overview of errors and data activity, such as the number of errors.

Azure Screenshot
(© 2018 Microsoft Corporation. All rights reserved.)

Azure has a number of log reports and diagnostic tools, including a PHP log analyzer that can generate a report of errors.

Azure Screenshot
(© 2018 Microsoft Corporation. All rights reserved.)

The output for the diagnostic report looks like the following:

Azure Screenshot
(© 2018 Microsoft Corporation. All rights reserved.)

The diagnostic report displays a list of errors and includes the time the errors occurred and the message.

Using Kudu to Access Logs

Azure has a number of resources that let you download files, and several ways to view Azure log files and migrate them. These files are stored as blobs due to their large amount of data.

Azure includes an application called Kudu that lets you view files in your browser. Kudu is available in the “Advanced Tools” section of your application resource group.

Image of recent errors
(Kudu© is listed as an Advanced Tool in the application resource group)

(Kudu© is listed as an Advanced Tool in the application resource group)

Click the CMD menu option in the Debug Console.

Image of Kudu Advanced Tools
(Kudu© menu options)

You then get a list of your files.

Image of Kudu Advanced Tools
(Kudu© file list)

Log files are located in the LogFiles directory.

Note: For a quick shortcut to Kudu, just type the following into your browser:

http://<yoursitename>.scm.azurewebsites.net

Kudu is great to get an idea of how many files you need to export and their sizes, but it’s clunky for file transfers. You can download your log files one by one from this interface, but you probably have several blobs you want to export.

Microsoft provides Azure customers with several options for managing files. Azure Storage Explorer gives you an Explorer-looking interface to view and manage files. AzCopy is a command-line utility which lets you move a file from an Azure storage location to your local drive or another HTTP directory.

Papertrail Logging Analytics Extends What Azure Offers

With the Papertrail solution, you can aggregate events straight to one location, then graph, search, monitor, and identify errors within your applications. The screenshot below shows a search based on the virtual machine name. This is useful if you want to view events only from a specific Azure service. Leaving the search phrase in the text box will show events only from that particular VM. The benefit is that if you know you have a certain VM acting up, you can filter out any noise from other applications and servers and focus on the VM giving you problems.

Papertrail event viewer
(The Papertrail Event Viewer search with associated graph)

Another benefit is graphing directly from the Papertrail Event Viewer. For instance, suppose you want to know the number of events logged on your VM within the last 30 minutes. Just click the “Graph” icon Graph icon next to your search and choose the time window in the top-right corner of the graphing section. Papertrail shows you an interactive graph with your logged events based on your search.

Send Application and Windows Event Logs to Papertrail

Enterprise organizations usually have several Azure applications spanning multiple resource groups in Azure. With Papertrail and NXLog, you can aggregate logs across multiple resource groups. Just connect your Windows event logs to the Papertrail service using the free NXLog agent. NXLog monitors the Windows event log, so any operational or application events logged to your Windows event log are passed to Papertrail.

The first step is to configure NXLog on your virtual server. Don’t forget to restart the NXLog service after you’ve installed it to start pushing logs to Papertrail. After you finish the installation and a reboot of the service, go to the Papertrail Event Viewer and you’ll see Windows events listed as they are logged on your VM.

Papertrail event viewer
(Windows Azure events in the Papertrail Event Viewer)

You can also stream application events from your own logs by pointing NXLog to “watch files.” In the NXLog “nxlog.conf” file, you’ll find this element commented out.

# Monitor application log files
<Input watchfile>
  Module im_file
  File 'C:\\path\\to\\*.log'
  Exec $Message = $raw_event;
  Exec if file_name() =~ /.*\\(.*)/ $SourceName = $1;
  SavePos TRUE
  Recursive TRUE
</Input>

Change the File path to include your own custom log file for NXLog to watch and push to Papertrail. Here is an example:

# Monitor a single application log file
<Input watchfile2>
  Module im_file
  File 'C:\\Papertrail\\test.log'
  Exec $Message = $raw_event;
  Exec if file_name() =~ /.*\\(.*)/ $SourceName = $1;
  SavePos TRUE
  Recursive TRUE
</Input>

In the above configuration, any time changes are made to C:\Papertrail\test.log and the events are sent to Papertrail. Note that the backslashes are important with these path configurations; if they are missing, the process will fail.

If you have a .NET development environment, you can use log4net installed on Visual Studio. After it’s installed, configure it for Papertrail and it will work simultaneously with your Visual Studio environment to log errors and exceptions.

Sending Azure Diagnostic Logs

For diagnostic logs, you need an agent such as Logstash or Fluentd to pull your logs from an Azure storage location and stream them to Papertrail. Before you start, you need to configure an Azure Event Hub to aggregate your logs. Also, give permission for Azure to provide access to the logs.

After you set up a hub, you need to set up an input plugin to read from Azure. There are several listed on the Fluentd plugin page and Logstash plugin page. For Papertrail, you then configure the remote syslog plugin to send the logs to Papertrail. For Logstash, you can use the syslog output plugin. Configure them to send logs to the log output destination shown in your Log Destinations account page.

Conclusion

Azure can be beneficial for any organization that needs to scale fast and is not looking to build out their own internal infrastructure. Monitoring a vast array of portal applications, however, can introduce some challenges. Papertrail can help by extending what Azure offers and helping you to aggregate event logs into a consolidated view that simplifies troubleshooting. Papertrail is designed to allow you to easily search, archive, filter, live tail, and graph application logs, which can make issue resolution across applications much easier. Sign up for a free trial today.

Simplicity in Dev Tools is a Lost Art

Posted by Jason Skowronski on

I remember the days when I’d develop using simple Linux command line tools. When I worked at Amazon almost 10 years ago I used a lot of old-school tools like Vim, ssh, and grep. It took some time to get familiar with them, but I figured them out just by reading a manual page or watching my coworkers. For better or worse, developer and ops tools are getting so complex we have to read books to become proficient. Newer tools offer more features and scalability, but it comes at the expense of complexity to learn and manage. How should we decide when to stick with simple tools and when to invest in more powerful or complex ones?

Complexity Creep

The search for development bliss has made our toolchains more complex. Instead of using a simple terminal text editor like Vim, now developers use ones like Atom or VS Code that include hundreds of modules. Package managers like NPM pull hundreds or thousands of transitive dependencies of often unknown origin. Instead of writing JavaScript directly, now we write in dozens of languages from ES6 to CoffeeScript to C++, all transpiled back to JS.

On the infrastructure side, the cloud software revolution freed us from managing hardware, but now we manage an even more complex set of microservices and orchestration tools. Microservices run inside frameworks… on app servers… which run inside containers… on Kubernetes… in the cloud… configured by terraform. Monitoring these services is no easy task either. Instead of using simple text files for logs, solutions like Elastic Stack shard data across a cluster of log servers and agents trace requests across distributed nodes.

This is just a small slice of the technologies developers need to learn. Take a look at this cloud native landscape from the Cloud Native Computing Foundation (CNCF).

image of cloud native landscape
© 2018 Cloud Native Computing Foundation. All rights reserved. CNCF Cloud Native Interactive Landscape

This is impressive, but can you imagine being a beginner and having to learn all these technologies? It’s enough to make your head spin! Every solution you add to your stack requires extra time to set up and maintain, extra time to learn and train employees, and extra mental space. You have to spend days comparing the pros and cons of different solutions, and when something goes wrong, you have to deal with all of that complexity to fix it. It makes one yearn for the days of simpler tools.

Innovators Should Value Agility

Too often I see startups adopting the latest buzzword technology, whether it’s a new JavaScript framework on the front end or an orchestration framework on the back end. Usually these new technologies are more complex and introduce more risk than the less exciting and mature ones.

The vast majority of new products don’t need enterprise-scale infrastructure right away. Using overly complex solutions too early can lead to technical debt as your needs change. Creating a minimum viable product (MVP) isn’t the only way to stay lean; also consider your minimum viable infrastructure. Invest in solutions that are simple initially, and swap them out as your needs change.

The most important thing startups and innovators need to be successful is AGILITY. If you are still reaching product market fit or pivoting your solution, you need to quickly validate the hypotheses for your product with minimal waste or overhead. The needs for your project and architecture will inevitably change as you adapt to market needs. To be agile, you need tools that allow for fast iteration and that can be quickly adopted and changed when needed.

Use the Right Tool for the Job

When you don’t need complexity, take advantage of simpler dev tools. If I have to make a quick text edit, I’ll use simple and fast text editor like Vim. If I’m diving into a complex project, I’ll take the time to load the more complex VS Code or Atom.

When starting a new project, think about using a managed platform like Heroku so infrastructure is out of the way. Setting up an EC2 server and load balancer might not seem hard, but now imagine automating deployments several times per day with CI/CD, hosting different versions, and scaling it, and you see how complexity starts to add up.

Also, keep your monitoring tools simple so your ops team isn’t bogged down setting them up. Tools like Splunk and Elastic Stack are powerful, but take more time to set up on a cluster including the agents, the search indexers, and the front-end UI.

When you’re starting a new project, Papertrail™ is an easy logging solution to use. It handles log management for you in the cloud so you don’t need to manage a log server. You don’t need to install agents, and you’ll be done typically in 45 seconds. Its live tail mode looks like what you’d see in your Linux terminal—and it works nearly as fast!

Simplicity in your dev tools and infrastructure can clear space in your brain. Don’t waste too much time on fancy tools early on. Save your brain space for things that really matter to your business, like building a better product and helping your customers.

Modern Apps Make Log Aggregation More Important Than Ever

Posted by Jennifer Marsh on

With the popularity of microservices, cloud integration, and containers, the distribution of log files can get out of hand. If you have several dozen applications distributed across the cloud, it gets difficult to aggregate and review logs when something goes wrong. When you distribute applications in this way, log aggregation is more important than ever to quickly analyze and fix problems.

Imagine a scenario where one of those applications crashes and you need to find the cause and fix it. Operations administrators and developers have to dig across the network to find the right log that gives them the right answer. Without log aggregation, this can add hours to analysis, and every minute counts as downtime persists and damages your customer experience.

Distributed Apps and Logging Integration

Microservices

The concept of microservices changes the traditional way coders build applications. Instead of one monolithic codebase, small autonomous services are built based on a particular function. Since each of these microservices have their own codebase, they also have their own logs. When one microservice crashes, it could affect others, making it difficult to track bugs.

Containers

Instead of one monolithic codebase, the system is built on small modular components deployed on containers. These containers are unlike virtual machines in that they are dependent on the underlying operating system. A platform like Docker has built-in support for capturing logs in JSON files, but the system operator handles aggregating logs for analysis.

Serverless architecture

Your developers no longer need to focus on the infrastructure that hosts the application. They deploy applications to AWS and host them in the cloud. This ultimately removes much of the hardware and configuration overhead for developer projects, but means logs are stored with the cloud provider hosting the architecture, such as CloudWatch Logs.

Multi-cloud distribution

AWS, Azure, Google Cloud, Digital Ocean, the list goes on. You may even have a hybrid model with your own on-prem private cloud. After a critical production issue, your administrators must download and combine logs for a holistic view on the health of your infrastructure.

Edge computing

Edge computing allows you to run functions close to the client to reduce latency and bandwidth needs. This model is common for CDNs that use edge servers at data centers across the globe. Each server delivers content based on the user’s location, speeding up content delivery. However, each one also creates its own logs which need to be centralized for analysis.

IoT and mobile computing

Apps deployed to IoT and mobile devices have their own set of logs that are stored or deleted on a device. Without a centralized logging or crash reporting solution, your support team must ask the customer to manually send logs for troubleshooting, which is cumbersome and slows time to resolution.

Log Aggregation is More Important Than Ever

Aggregating logs is important because it’s not always obvious which system is the culprit. Administrators must comb through logs on separate platforms to find an error that gives them a clue to help them find a solution. Even if the root of the problem is found, a domino effect could corrupt data or cause applications among other services to fail. Repairing a suite of applications can take weeks when logs are fragmented across several systems.

With logs at each service’s location, administrators and developers can’t get a full picture. They pool them together, collect them, transfer them all to a centralized storage location, and then perform analysis. Then, transactions across your infrastructure can be traced downstream to the service that’s the root cause.

The answer to this fragmented logging issue is to provide one pane of glass in a centralized location that lets you see your entire application environment.

Papertrail and Log Aggregation

Papertrail™ creates that single window pane to view all logs in one central environment. Now you have one place to search, review, skim, and analyze. No more SSHing into one server at a time or copying files from multiple locations manually.

By aggregating your logs in one location, you can debug faster and even interactively analyze them in real-time with the live tail service. Seek by time, context, and color-coordinate to better organize your files and quickly review issues based on frameworks and languages.

The Papertrail solution’s log velocity analytics answers the question “How often does this happen?” Find trends in your bugs so you can stop them before they become persistent errors.

(Papertrail log velocity analytics)

With quick setup, you can aggregate your log files and create a frustration-free monitoring and analysis environment for administrators who work with 2 or 2,000 servers. It’s the solution for any organization with distributed apps that need solutions quickly when any one of your mission-critical apps fails.

Search history: access recent searches in Papertrail

Posted by @rpheath on

When troubleshooting, a single search can become a theme with variations. Even if it’s not worth saving, it’s worth remembering, for a while. Until today, Papertrail didn’t make getting back to recent searches especially easy, but that’s changing.

Today we’re excited to release a new feature that provides user-specific access to search history.

How it works

There are two ways to access search history in Papertrail’s Event Viewer:

By clicking

The magnifying glass in the search input now toggles the search history view:

Once the search history is opened, click on the search you want to perform.

Keyboard shortcuts

We designed search history to be an extension of the search input, so it felt natural to support a keyboard workflow.

When the search box has focus, press to open the search history. Once opened, the list can be navigated using the or arrow keys. When the right search is highlighted, press Enter to execute it.

When a query is close, but requires a small tweak, it can be tedious to rewrite the entire query just for that little adjustment. With search history, any recent search can be copied down to the search box for modification.

Right now, a search can only be saved in the event viewer if it’s currently active. With search history, a recent search can be saved any time, without needing to load it first. This keeps the focus on troubleshooting, without interruptions to save searches.

What do you think?

Not all searches need to be saved, but that doesn’t mean those searches aren’t important. With search history, they’re only an arrow away.

We hope you find search history useful. Give it a try, and let us know if you have any questions or feedback. Enjoy!

Lightning Search & Log Velocity Analytics

Posted by @jshawl on

We’ve been working on a few new features that will dramatically increase the speed at which logs are searched and enhance visibility into log volume. These features are immediately available to new customers and will be rolled out to existing customers in the coming weeks.

Lightning Search changes the way Papertrail™ stores and searches logs, resulting in a dramatic increase in search speed. This new architecture also enables us to release a new feature: Log Velocity Analytics. Understanding log volume has never been easier. While viewing logs, click on the graph icon, and visually explore log data. The graph will reflect the current search, group or system.

Whether you’re trend spotting over a week, or diagnosing a spike in the last 10 minutes, Log Velocity Analytics reduces the time needed to understand the data in your logs.

Set up remote_syslog2 quickly with official Chef, Puppet, and Salt recipes

Posted by @lyspeth on

Papertrail now officially supports automated setup of the remote_syslog2 logging daemon with Chef, Puppet, and Salt:

All three support Ubuntu, Amazon Linux, CentOS/RHEL, and Debian 9. (A previous version of this post mentioned an upstream issue with Puppet and Debian 9 that has now been resolved.)

The Chef cookbook supports 12.21.x and 13.x, and the Puppet module has been tested with 4.x and 5.x.

Get set up easily by providing:

  • the account’s log destination information
  • a list of files to monitor
  • any desired config options for typical remote_syslog2 operation

If your config strategy of choice is cookbooks, modules, or formulas, try these out. Tell us if you run into any speed bumps.

Hello cron job monitoring & alerts, goodbye silent failures

Posted by @coryduncan on

Papertrail has had the ability to alert on searches that match events for years, but what about when they don’t? When a cron job, backup, or other recurring job doesn’t run, it’s not easy to notice the absence of an expected message. But now, Papertrail can do the noticing for you. Today we’re excited to release inactivity alerts, offering the ability to alert when searches don’t match events.

Set up an inactivity alert

From the create/edit alert form, choose “Trigger when no new events match”

Inactivity alerts

Once saved, the alert will send notifications when there are no matching events within the chosen time period. Use this for:

  • cron jobs
  • background jobs which should run nearly all the time, like system monitors/pollers and database or offsite backups
  • lower-frequency scheduled jobs, like nightly billing

Try it

If you have cron jobs, backup jobs, or other recurring or scheduled jobs, they almost certainly already generate logs. Here’s how to have Papertrail tell you when they don’t run or run but don’t complete successfully:

  1. Search for the message emitted when a cron job finishes successfully (example: cron)
  2. Click “Save Search”
  3. Attach an alert, like to notify a Slack channel or send an email.

No logs? No problem.

Very rarely, a recurring job doesn’t generate log messages on its own. For those, use the shell && operator and logger to generate a post-success log message. For example, ./run-billing-job && logger "billing succeeded" will send the message billing succeeded to syslog if and only if run-billing-job finishes with a successful exit code. Use "billing succeeded" as the Papertrail search.

What do you think?

Give inactivity alerts a try and if you have questions or feedback, let us know.

Green means go(od): Spot inactive log senders at a glance

Posted by @lyspeth on

Ever wanted to quickly see which systems haven’t sent logs recently? Now it’s as easy as checking a traffic light. Visit the Dashboard and click a group name, then scan the list of systems:

Systems with activity status

  • Green-light systems are currently sending logs
  • Yellow-light systems aren’t currently sending logs, but have sent logs in the last 24 hours
  • Red-light systems haven’t sent logs in the last 24 hours

Text on the right shows when Papertrail last saw logs.

It’s an easy way to make sure critical systems are still logging after a deployment or upgrade.

Try it out, and if you see anything unusual, or just want to opine on the intervals or colors, tell us.

Advanced event viewer keyboard shortcuts

Posted by @coryduncan on

Today we’re excited to release two new keyboard shortcuts within the event viewer:

Highlight and link to multiple events

When a series of events is relevant, it can be useful to share those events with teammates. This is now possible.

Holding Shift will put the event viewer into “selection mode.” While holding down Shift:

  • Start a selection by clicking the selection button button next to an event.
  • Select a range by clicking a selection button above or below an existing selected event.

Event selection

The browser URL will update to indicate which events are selected. From there it’s as easy as copying and pasting the link. Clear a selection by pressing Esc.

Retain search query when clicking

Extending the idea of flexible context to the entire event viewer, holding Alt while clicking a link will retain (instead of replace) the current search query. This works for orange and blue context links as well as click-to-search.

Retain search query

To see these and all other keyboard shortcuts, press ? while in the event viewer. Try out the new shortcuts and as always, let us know if we can do better. Enjoy!

Use Zapier to send logs anywhere

Posted by @lyspeth on

Papertrail’s search alerts are great, but what happens when you need a specialized integration, or want to grab something other than raw messages and counts – like particular fields from a message, or data to analyze later?

Now, you can invoke a Zapier action using a webhook trigger, which can then perform any desired action in Zapier. This example Zap setup sends data on printer service behavior to a Google Sheet for later analysis.

Set up the Zap

Sign up for Zapier, or log in.

Create a new Zap. Under Built-In Apps, select Webhooks by Zapier, then Catch Hook, and save the new Zap.

create_zap.gif

In the Pick off a Child Key dialog that appears:

child_key_dialog.png

enter payload.events to get to the event details and Continue to show the Zap’s webhook URL.

Set up the alert

Create a saved search to find the lines of interest, then add a Zapier alert integration. Grab the webhook URL and paste it into the alert, then save.

setup_zap_alert.gif

Process data

Create a Google spreadsheet with column names for the relevant fields: received_at, source_name, program, message.

printer status data

and test the data-sending step by clicking Send test data on the Papertrail alert, and in Zapier, clicking OK, I did this.

send_test_data.gif

Once the test has succeeded, set up the action. Select the Google Sheets app, then Create Spreadsheet Row. Select the spreadsheet and worksheet created earlier, and fill in the fields with selections from the payload.

set_up_sheets.gif

Voila! A spreadsheet that will dynamically update with details from the matched events when the Papertrail alert fires.

If custom alerts going right into your app of choice sounds great, give it a try and let us know your thoughts.