I'm on Windows and my text files are detected as binary.
Git works best when you store text files as UTF-8. Many
programs on Windows support UTF-8, but some do not and only
use the little-endian UTF-16 format, which Git detects as
binary. If you can't use UTF-8 with your programs, you can
specify a working tree encoding that indicates which encoding
your files should be checked out with, while still storing
these files as UTF-8 in the repository. This allows tools
like git-diff(1) to work as expected, while still allowing
your tools to work.
To do so, you can specify a gitattributes(5) pattern with the
working-tree-encoding
attribute. For example, the following
pattern sets all C files to use UTF-16LE-BOM, which is a
common encoding on Windows:
*.c working-tree-encoding=UTF-16LE-BOM
You will need to run git add --renormalize
to have this take
effect. Note that if you are making these changes on a
project that is used across platforms, you'll probably want
to make it in a per-user configuration file or in the one in
$GIT_DIR/info/attributes
, since making it in a .gitattributes
file in the repository will apply to all users of the
repository.
See the following entry for information about normalizing
line endings as well, and see gitattributes(5) for more
information about attribute files.
I'm on Windows and git diff shows my files as having a ^M
at the
end.
By default, Git expects files to be stored with Unix line
endings. As such, the carriage return (^M
) that is part of a
Windows line ending is shown because it is considered to be
trailing whitespace. Git defaults to showing trailing
whitespace only on new lines, not existing ones.
You can store the files in the repository with Unix line
endings and convert them automatically to your platform's
line endings. To do that, set the configuration option
core.eol
to native
and see the following entry for
information about how to configure files as text or binary.
You can also control this behavior with the core.whitespace
setting if you don't wish to remove the carriage returns from
your line endings.
Why do I have a file that's always modified?
Internally, Git always stores file names as sequences of
bytes and doesn't perform any encoding or case folding.
However, Windows and macOS by default both perform case
folding on file names. As a result, it's possible to end up
with multiple files or directories whose names differ only in
case. Git can handle this just fine, but the file system can
store only one of these files, so when Git reads the other
file to see its contents, it looks modified.
It's best to remove one of the files such that you only have
one file. You can do this with commands like the following
(assuming two files AFile.txt
and afile.txt
) on an otherwise
clean working tree:
$ git rm --cached AFile.txt
$ git commit -m 'Remove files conflicting in case'
$ git checkout .
This avoids touching the disk, but removes the additional
file. Your project may prefer to adopt a naming convention,
such as all-lowercase names, to avoid this problem from
occurring again; such a convention can be checked using a
pre-receive
hook or as part of a continuous integration (CI)
system.
It is also possible for perpetually modified files to occur
on any platform if a smudge or clean filter is in use on your
system but a file was previously committed without running
the smudge or clean filter. To fix this, run the following on
an otherwise clean working tree:
$ git add --renormalize .
What's the recommended way to store files in Git?
While Git can store and handle any file of any type, there
are some settings that work better than others. In general,
we recommend that text files be stored in UTF-8 without a
byte-order mark (BOM) with LF (Unix-style) endings. We also
recommend the use of UTF-8 (again, without BOM) in commit
messages. These are the settings that work best across
platforms and with tools such as git diff
and git merge
.
Additionally, if you have a choice between storage formats
that are text based or non-text based, we recommend storing
files in the text format and, if necessary, transforming them
into the other format. For example, a text-based SQL dump
with one record per line will work much better for diffing
and merging than an actual database file. Similarly,
text-based formats such as Markdown and AsciiDoc will work
better than binary formats such as Microsoft Word and PDF.
Similarly, storing binary dependencies (e.g., shared
libraries or JAR files) or build products in the repository
is generally not recommended. Dependencies and build products
are best stored on an artifact or package server with only
references, URLs, and hashes stored in the repository.
We also recommend setting a gitattributes(5) file to
explicitly mark which files are text and which are binary. If
you want Git to guess, you can set the attribute text=auto
.
For example, the following might be appropriate in some
projects:
# By default, guess.
* text=auto
# Mark all C files as text.
*.c text
# Mark all JPEG files as binary.
*.jpg binary
These settings help tools pick the right format for output
such as patches and result in files being checked out in the
appropriate line ending for the platform.